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Methods and Compositions for Modulating 
Morphogenic Protein Expression 

Reference to Related Applications 

This application is a continuation-in-part of USSN 08/255,250, 
filed June 7, 1994 which is a continuation-in-part of 
USSN 07/938,021, filed August 28, 1992 which is a cont inuat ion-in- 
5 part of USSN 07/752,861, filed August 30, 1991 which is a 

continuation-in-part of USSN 07/667,274, filed March 11, 1991, the 
disclosures of which are incorporated herein by reference. 

Field of the Invention 

10 The invention relates generally to the field of drug screening 

assays. More particularly, the invention relates to methods and 
compositions for identifying molecules that modulate production of 
true tissue morphogenic proteins. 

15 Background of the Invention 

A class of proteins recently has been identified, the members 
of which are true tissue morphogenic proteins. The members of 
this class of proteins are characterized as competent for inducing 
the developmental cascade of cellular and molecular events that 

20 culminate in the formation of nev; organ-specific tissue, including 
any vascular and connective tissue formation as required by the 
naturally occurring tissue. Specifically, the morphogens are 
competent for inducing all of the following biological functions 
in a morphogenically permissive environment: (1) stimulating 

25 proliferation of progenitor cells; (2) stimulating differentiation 
of progenitor cells; (3) stimulating the proliferation of 
differentiated cells and (4) supporting the growth and maintenance 
of differentiated cells. For example, the morphogenic proteins 
can induce the full developmental cascade of bone tissue 

30 morphogenesis, including the migration and proliferation of 
mesenchymal cells, proliferation and differentiation of 
chondrocytes, cartilage matrix formation and calcification, 
vascular invasion, osteoblast proliferation, bone formation, bone 
remodeling, and hematopoietic bone marrow differentiation. These 
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proteins also have been shov;n co induce crue tissue morphogenesis 
of non-chondrogenic tissue, including dentin, liver, and nerve 
tissue . 

A particularly useful tissue morphogeriic protein is human OP-1 
5 (Osteogenic Protein-1), described in U.S. 5,011,691; US Pat. No. 
5,266,683 and Ozkaynak et al . (1990) EMBO J. 9: 2085-2093. 
Species homologues identified to date include mouse OP-1 (see US 
Pat. 5,266,683) and the Drosophila homologue 60A, described in 
VJharton et al . (1991) PNAS 88:9214-9218). Other closely related 
10 proteins include OP-2 (Ozkaynak (1992) J. Biol. Chem. 267:25220- 
25227 and US Pat. No, 5,266,683); BMP5 , BMP6 (Celeste et al . 
(1991) PNAS 87 :9843-9847) and Vgr-1 (Lyons et al . (1989). These 
disclosures are incorporated herein by reference. 

It previously has been contemplated that these tissue 
15 morphogens can be administered to an animal co regenerate lost or 
damaged tissue. Alternatively, one can envision administering a 
molecule capable of modulating expression of the endogenous tissue 
morphogen as a means for providing morphogen to a site in vivo. 

It is an object of this invention to provide compositions and 
20 methods of screening compounds v;hich can modulate expression of an 
endogenous tissue morphogen, particularly OP-1 and closely related 
genes. The compounds thus identified have utility both in vitro 
and in vivo. Useful compounds contemplated include those capable 
of stimulating transcription and/or translation of the OP-1 gene, 
25 as well as compounds capable of inhibiting transcription and/or 
translation of the OP-1 gene. 

These and other objects and features of the invention will be 
apparent from the description, drawings and claims which follow. 

30 Summary of the Invention 

The invention features compositions and methods for screening 
candidate compounds for the ability to modulate the effective 
local or systemic quantity of endogenous OP-1 in an organism, and 
methods for producing the compounds identified. In one aspect, 
35 the method is practiced by: (1) incubating one or more candidate 
compounds with cells transfected with a DNA sequence encoding, in 
operative association with reporter gene, a portion of an OP-1 
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non-coding DNA sequence that is competent to act on and affect 
expression of the associated receptor gene; (2) measuring the 
level of reporter gene expression in the transfected cell, and (3) 
comparing the level of reporter gene expressed in the presence of 
5 the candidate compound with the level of reporter gene expressed 
in the absence of the candidate compound. In a related aspect, 
the invention features the compound that is identified by use of 
the method of the invention. 

The screening method of the invention provides a simple method 
10 of determining a change in the level of a reporter gene product 

expressed by a cell following exposure to one or more compound (s) . 
The level of an expressed reporter gene product in a given cell 
culture, or a change in that level resulting from exposure to one 
or more compound (s) indicates that application of the compound can 

15 modulate the level of the morphogen expressed and normally 
associated v;ith the non-coding sequence. Specifically, an 
increase in the level of reporter gene expression is indicative of 
a candidate compound's ability to increase OP-1 expression in 
vivo. Similarly, a decrease in the level of reporter gene 

20 expression is indicative of a candidate compound's ability to 
decrease or otherv;ise interfere vith OP-1 expression i n vi vo . 

The methods and compositions of the invention can be used to 
identify compounds showing promise as therapeutics for various in 
vivo and ex vivo mairimalian applications, as well as to identify 

25 compounds having numerous utilities. For example, morphogen 

expression inducing compounds can be used in vivo to correct or 
alleviate a diseased condition, to regenerate lost or damaged 
tissue, to induce cell proliferation and differentiation, and/or 
to maintain cell and tissue viability and/or a differentiated 

30 phenotype in vivo or ex vivo. The compounds also can be used to 
maintain the viability of, and the differentiated phenotype of, 
cells in culture. The various in vivo, ex vivo, and in vitro 
utilities and applications of the morphogenic proteins described 
herein are well documented in the art. See, for example, US 

35 92/01968 (WO 94/03200), filed March 11, 1992; US 92/07358 (WO 

93/04692), filed August 28; PCT US 92/0743 (V,'0 93/05751), filed 
August 28, 1992; US 93/07321 (WO 94/03200), filed July 29, 1993; 
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US 93/08808 (WO 94/06449), filed September 16, 1993; US93/08885 
(WO94/06420) , filed September 15, 1993, and US Pat. No. 5,266,683. 

Morphogen expression inhibiting compounds identified by the 
5 methods, kits and compositions described herein can be used to 
modulate the degree and/or timing of morphogen expression in a 
cell. Such compounds can be used both in vicro and in vivo to 
more closely regulate the production and/or available 
concentration of morphogen. 



List of useful terms and Definitions 

As used herein, "gene expression" is understood to refer to 

the production of the protein product encoded by a DN.-, sequence of 

interest, including the transcription of the DNA sequence and 
15 translation of the mRNA transcript. 

As used herein, "operative association" is a fusion of the 
described DNA sequences with a reporter gene in such a reading 
frame as to be co-transcribed, or at such a relative piositioning 
as to be competent to modulate expression of the reporter gene. 

As used herein, "vector" is understood to mean any nucleic 
acid comprising a nucleotide sequence of interest and competent to 
be incorporated into a host cell and recombining with and 
integrating into the host ceLl genome. Such vectors include 
linear nucleic acids, plasmids, phagemids, cosmids , YAC ' S (yeast 
25 artificial chromosomes) and the like. 

As used herein, "non-coding sequence" or "non-coding DNA" 
includes DNA sequences that are not transcribed into RNA sequence, 
and/or RNA sequences that are not translated into protein. This 
category of •'non-coding sequence" has been defined for ease of 
30 reference in the application, and includes sequences occurring 5* 
to the ATG site which indicates the start codon and sequences 3' 
to the stop codon, as well as intervening intron sequences that 
occur within the coding region of the gene. As used herein, an 
"OPl -speci f ic " non-coding sequence is understood to define a non- 
35 coding sequence that lies contiguous to OPl specific coding 
sequence at an OP-1 gene locus under naturally-occurring 
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conditions. The sequences may include 5'. 3' and intron 
sequences . 

As used herein, -allelic, species and other sequence variants 
thereof- includes point mutations, insertions and deletions such 
5 as would be naturally occurring or which can genetically 

engineered into an OP-1 non-coding DMA sequence and which do not 
affect substantially the regulation of a reporter gene by the OP-1 
non-coding sequence. For example, one of ordinary skill in the 
art can use site directed mutagenesis to modify , as by deletion, 
10 for example, one or more of the OP-1 non-coding sequences 

described herein without substantially affecting the regulation of 
OP-1 or a reporter gene by the modification. Such modifications 
are considered to be v;ithin the scope of the disclosure provided 
herein. 

15 As used herein, a "Wt-l/Egr-1 consensus binding sequence" or 

WL-l/Egr-1 consensus binding element" is a nine base sequence 
which has been shown to be bound by the DMA binding proteins Wt-1 
and Egr-1. The consensus sequence of the wt-l/Egr-1 binding sice 
has been determined by homology to ba GNGNGGGNG, Seq . ID No. 4 

20 (Rauscher et al^, science 250 : 1259- 1262 (1990), incorporated 
herein by reference) . 

As used herein, a -TCC binding sequence" or "TCC binding 
element" is an approximately 15 to 20 base sequence of DNA which 
contains at least three contiguous or non-contiguous repeats of 

25 the DNA sequence TCC. The TCC binding sequence identified in 
human OP-1 genomic DNA is shov/n in Seq. ID No. 5, and the TCC 
binding sequence identified in murine OP-1 genomic DNA is shown in 
Seq. ID No. 6. The TCC binding sequence has also been shown to be 
bound by the DNA binding proteins Wt-1 and Egr-1 (Wang et al . , 

30 Proc. Natl. Acad. Sci. 90:8896-8900 (1993); Wang et al . , Biochem 
Biophys Res. Comm. , 188:433-439 (1992)). 

As used herein, a "FTZ binding sequence" or "FTZ binding 
element" is a Fushi-tarazu DNA sequence (FTZ) that has been shown 
to be bound by the DNA binding protein Fushi-tarazu (FTZ-Fl) . The 
35 FTZ binding sequence identified in human OP-1 genomic DNA is shown 
in Seq. ID No. 7. The FTZ consensus sequence, a consensus 
sequence for the nuclear hormone receptor super family, is 
YCAAGGYCR . 
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As used herein, a "steroid binding sequence" or "steroid 
binding element" is a DNA sequence that has been shown to be bound 
by one or more elements, in response to activating signal 
molecules. Examples of such "activating signal molecules* include 
5 retinoids. Vitamin D, and also include steroids such as estrogen 
and progesterone. Useful elements are anticipated to include the 
FTZ-Fl protein, WT-1 and Egr-1. Activating signal molecules of 
the nuclear receptor family have recently been shown to bind to 
DNA as homodimers, heterodimers or as monomers (Parker, M.G., 

10 Curr. Op. Cell Biol., 1993, 5:499-504). The formation of 

heterodimers among the nuclear receptor family molecules may 
significantly increase the diversity of binding elements which are 
recognized by these nuclear receptors, and provide for 
differential regulation of genes containing the specific binding 

15 sites. In addition, the nuclear receptors have been shown to 
interact with other accessory factors, such as transcription 
factors, to stimulate or repress transcription. These 
interactions, between the nuclear receptors and the nuclear 
receptors and accessory factors, indicate that there could be 
20 significant number of nuclear receptor /accessory factor 
interactions which have widely different transcriptional 
act ivit ies . 

v;hile the method of the invention is described with reference 
to a single cell, as will be appreciated by those having ordinary 
25 skill in the art, this is only for ease of description, and the 

method is most efficiently carried out using a plurality of cells. 

With respect to transfection of DNA sequences in the cell and 
the method of the invention, all means for introducing nucleic 
acids into a cell are contemplated including, without limitation, 
30 CaP04 co-precipitation, electroporat ion , DEAE-dextran mediated 

uptake, protoplast fusion, microinjection and lipofusion. A key 
to the invention is the DNA sequences with v/hich the cell is 
transfected, rather than the mechanical or chemical process by 
which the DNA incorporation is accomplished, 

35 Useful reporter genes are characterized as being easy to 

transfect into a suitable host cell, easy to detect using an 
established assay protocol, and genes whose expression can be 
tightly regulated, other reporter genes contemplated to have 
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utility include, without limitation, the luciferase gene, the 
Green Fluorescent Protein (GFP) gene, the chloramphenicol Acetyl 
Transferase gene (CAT), human growth hormone, and beta- 
galactosidase. Additional useful reporter genes are any well 
5 characterized genes the expression of which is readily assayed, 
and examples of such reporter genes can be found in, for example, 
F.A. Ausubel et al . , Eds., Current Protocols in Molecular Biology, 
John Wiley U Sons, New York, (1989). As will be appreciated by 
those having ordinary skill in the art, the listed reporter genes 
10 are only a few of the possible reporter genes, and it is only for 
ease of description that all available reporter genes are not 
listed . 

While the method, vectors, and cells described recite the use 
of a reporter gene in operative association with an OP-1 non- 
15 coding DNA sequence, it will be apparent to those of ordinary 

skill in the art that the DMA sequence OP-1, including human OPl , 
shown in Seq. ID No. 1 or murine OP-1, disclosed in U.S. Patent 
No, 5,266,683, is also within the scope of a suitable reporter 
gene. Other suitable reporter genes can be used for ease in 
20 assaying for the presence of the reporter mRNA or reporter gene 
product . 

V-here a cell line is to be established, particularly where the 
transfected DNA is to be incorporated into the cell's genome, 
lines that can be immortalized are especially desirable. As used 
25 herein, "immortalized" cell lines are viable for multiple .passages 
(e.g., greater than 50 generations) v;ithout significant reduction 
in growth rate or protein production. 

VJhile the selected non-coding DNA sequences disclosed herein 
are described using defined bases, as will be appreciated by those 

30 having ordinary skill in the art, to some degree the lengths of 
the selected DNA sequences recited are arbitrary and are defined 
for convenience. As will be understood by those of ordinary skill 
in the art, "shorter sequences of OP-1 non-coding DNA sequence and 
other fusion DNA's can be used in a vector according to the 

35 invention, and can be transfected into a cell, or used in the 

method of the invention for screening a candidate compound for its 
ability to modulate OP-1 expression. Specifically, it is standard 
procedure for molecular biologists to first identify useful 
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10 



15 



20 



25 



regulatory sequences, and then co determine the minimum sequence 
required, by systematic digestion and mutagenesis e.g., by 
exonuclease or endonuclease digestion, site directed mutagenesis 
and the like. Accordingly, subsequent, standard routine 
experimentation is anticipated to identify minimum sequences and 
these, shorter sequences are contemplated by the invention 
disclosed herein. 

Useful cell types for the method and compositions according to 
the invention include any eukaryotic cell. Currently preferred 
are cell types known to express OP-1. Such cells include 
epithelial cells and cells of uro-genical cell origin, including 
renal (kidney or bladder) cells, as well as liver, bone, nerve, 
ovary, cardiac muscle and the like. The cells may be derived from 
tissue or cultured from an established cell line, see, for example 
Ozkaynak et al . (1991) Biochem. BioPhvs. Res, Comm. 179 : 116-123 
for a detailed description of tissues knov;n to express OP-l. 
Other useful cells include those known to exhibit a steroid 
receptor, including cells having an estrogen receptor and cells 
responsive to the FTZ-Fl protein. Currently preferred cells also 
have simple media component requirements. Other useful 
representative cells include, but are not limited to, Chinese 
hamster ovary (CHO) ; canine kidney (MDCK) ; or rat bladder (NBT-2) , 
and the like. Useful cell types can be obtained from the American 
Type Culture Collection (ATCC) , Rockville, MD or from the European 
Collection of Animal Cell Cultures, Portion Down, Salisbury 
SP40JG, U.K. As used herein, "derived*" means the cells are from 
the cultured tissue itself, or are a cell line whose parent cells 
are of the tissue itself. 



30 



35 



Aspects and Embodiments of the Invention 

In one aspect, the invention features a vector having a 
reporter gene operatively associated with a portion of one or more 
OP-l non-coding sequences. The OP-l non-coding sequence chosen is 
independently selected from the 5* (or "upstream") non-coding 
human or murine OP-l sequence shown in Seq. ID Nos . 1 and 2, 
respectively, the 3' (or "downstream") non-coding human or murine 
OP-l sequence shov;n in Seq, ID Nos. 1 or 3, and the human intron 
non-coding OP-l sequences shown in Seq, ID No. 1. Also 
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anticipated to be useful are the non-coding sequences (e.g., 5*, 
3' and incron) of other species homologs of OP-1 and proteins 
closely related to OP-1. In addition, the portion of OP-1 
sequence included in the vector can be a combination of two or 
5 more 5* non-coding, 3* non-coding and/or intron OP-1 sequences. 

In one embodiment, the vector can include a non-coding OPl- 
specific sequence selected from at least one of the following 
sequence segments of Seq. ID No, 1 presented below, and which 
define human genomic OP-1 sequence comprising approximately 3.3 Kb 
10 of 5' non-coding sequence. In Seq. ID No. 1, the start codon 
begins at position 3318, and the upstream sequence (bases 1 to 
3317) is composed of untranscribed (1 to 2790) and untranslated 
(2791 to 3317) OPl-specific DNA; approximately 1 Kb of which is 
presented in Fig. 1 (bottom strand) . 

15 Useful sequence segments include bases 2548-3317, representing 

750 bases sharing significant (greater than 70% identity) between 
the mouse and human OP-x homologs (See Fig. 1), and bases 3170- 
3317; 3020-3317; 2790-3317; 2S43-279C of Seq. ID No. 1, all 
shorter fragments of this region of the DN.-. . As base 2790 is the 

20 mRNA start site, other useful sequences include 2790-3317, 

representing transcribed but nor translated 5' coding sequence and 
shorter fragments of this DNA region as noted above; upstream 
fragments of OPl-specific DNA, bases 2548-2790; 1549-2790; 1-2790 
of Seq. ID No, 1. Also useful seque.nce segments include the 

25 approximately 750 bases that have homology between the human and 
mouse OP-1 sequences with additional upstream sequences, 2300 to 
3317,; 1300 to 3317; 1-3317; all fragments of the disclosed 
upstream OPl-specific DNA sequences of Seq. ID No. 1. 

In another embodiment, the sequences are defined by the non- 
30 coding sequences of the mouse OP-1 homolog, including the 

following 5' non-coding sequences (Seq. ID No. 2): 2150-2296, 
2000-2296, 1788-2296, and 1549-2296 all of which define the 750 
bases sharing high sequence identity with the human homolog (See, 
Fig. 1); 800-2296; 1-2296; 1549-1788, 800-1788 and 1-1788. 

35 VJithin this region also exist a number Egr/wt-1 sites (8 in 

hOP-1; 7 in mOP-1), knov;n in the art to bind the regulatory 
elements Egr and Wc-1. Accordingly, in another aspect, the 
invention contemplates a screening material for identifying 
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compounds which modulate OP-1 expression, the assay comprising the 
step of identifying compounds which bind Egr/Wt-1 site. At least 
oneWt/Egr-1 element, preferably between 1-6 elements, or at least 
6 Wt/Egr-1 elements are included in a sequence. The relative 
5 locations of these elements are indicated in Fig. 1 and at 

positions 3192-3200; 3143-3151; 3027-3035; 2956-2964; 2732-2740; 
2697-2704 of Seq. ID No. 1, and positions 2003-2011; 1913-1922; 
1818-1826; 1765-1776; 1757-1765; 1731-1739; 1699-1707; 1417-1425 
of Seq. ID No. 2 of Seq. ID Nos . 1, 2 substantially the same Seq. 
10 alignment. The lengths of bases within these 5' non-coding 

sequences is selected to include portions of the sequence of DNA 
which was determined to be homologous between murine and human 
genomic OP-1, separately and as a part of a larger sequence 
including non-homologous DNA. Additionally, the portion of OP-1 

15 sequence selected can be a portion of the region of homology 

between murine and human OP-1 DNA sequences, bases 2548-2790 or 
2548-3317 of Seq, ID No. 1, or bases 1549 to 1788 or 1549 to 2296 
of Seq. ID No. 2, and/or at least one of an v:t-l/Egr-l consensus 
binding sequence. In still another aspect the portion of OP-1 

20 sequence selected can include a TCC binding sequence, a FTZ 

binding sequence, a steroid binding sequence, or part or all of an 
OP-1 intron sequence. The relative positions of the TCC and FTZ 
elements are indicated in Fig. 1 and at positions 2758-2778 (TCC); 
2432-2441 (FTZ) of Seq. ID No. 1 and 1755-1769 (TCC) of Seq. ID 

25 No. 2. 

In another aspect, the invention features a cell that has been 
transfected v;ith a reporter gene in operative association with a 
portion of OP-1 non-coding DNA sequence. The portion of OP-1 non- 
coding sequence is independently selected from the 5' (or 

30 upstream) non-coding human or murine OP-1 sequence shown in Seq. 
ID Nos. 1 and 2, the 3' (or downstream) non-coding murine OP-1 
sequence shown in Seq. ID No . 3, and the human intron non-coding 
OP-1 sequence shown in Seq. ID No. 1. The six human intron non- 
coding OP-1 sequences are at bases 3736 to 10700; bases 10897 to 

35 11063; bases 11217 to 11424 ; t?^ses 11623 to 13358; bases 13440 to 
10548; bases 15166 to 17250; all of Seq, ID No. 1. In addition 
the portion of OP-1 sequence selected can be a combination of 5' 
non-coding, 3' non-coding and/or intron OP-1 sequence. Thus, the 
cell can have been transfected with a reporter gene in operative 
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association with a portion of 5' non-coding OP-1 genomic sequence 
that is independently selected from bases 3170 to 3317; 3020 to 
3317; 2790 to 3317; 2548 to 3317; 2300 to 3317; 1300 to 3317; 1 to 
3317; 2548 to 2790; 1549 to 2790; and 1 to 2790; all of Seq . ID 
5 NO. 1 or bases 2150 to 2296; 2000 to 2296; 1788 to 2296; 1549 to 
2296; 800 to 2296; 1 to 2296; 1549 to 1788; 800 to 1788; 1 to 
1788; all of Seq. ID No. 2. The lengths of bases within these 5' 
non-coding sequences is selected to include portions of the 
sequence of DNA which was determined to be homologous between 

10 murine and human genomic OP-1, separately and as a part of a 

larger sequence including non-homologous DNA. Additionally, the 
portion of OP-1 sequence selected can be a portion of the region 
of homology between murine and human OP-1 DNA sequences, such as 
^^ses 2543-2790 or 2548-3317 of Seq. ID No. 1. or bases 1549 to 

15 1788 or 1549 to 2296 of Seq. ID No. 2. and at least one of an Wc - 
1/Egr-l consensus binding sequence, a TCC binding sequence, a FT2 
binding sequence, a steroid binding sequence, and an intron. Thus 
the portion of OP-1 sequence selected can be a portion of the 5' 
non-coding human or murine OP-1 genomic DNA sequences, as stated 

20 above, and at least one Wt- 1/Egr-l consensus binding sequence 
alone or in combination with at least one of a TCC binding 
sequence, a FTZ binding sequence, a steroid binding sequence, and 
.a human OP-1 intron DNA sequence. In another embodiment more than 
one Wt-l/Egr-1 element is used, for example, between 1-6, or ac 

25 least six. These cells are suitable for use in the method of the 
invent ion . 

In one embodiment, part of the OP-1 coding region is 
anticipated to have an expression regulatory function and also can 
be added to a vector for use in the screening assay described 

30 herein. OP-1 protein is translated as a precursor polypeptide 
having an N-terminal signal peptide sequence (the -pre pro" 
region) which is typically less than about 30 amino acid residues, 
followed by a -pro" region which is about 260 amino acid residues, 
followed by the additional amino acid residues which comprise the 

35 mature protein. The pre pro and pro regions are cleaved from the 
primary translation sequence to yield the mature protein sequence. 
The mature sequence comprises both a conserved C-terminal seven 
cysteine domain and an N-terminal sequence v;hich varies 
significantly in sequence between the various morphogens . The 
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mature polypeptide chains dimerize and these dimers typically are 
stabilized by at least one interchain disulfide bond linking the 
two polypeptide chain subunits. After the pro domain is cleaved 
from the OP-1 protein it associates noncovalent ly with the mature 
5 dimeric protein, presumably to enhance solubility and/or targeting 
properties of the mature species. See, for example, 
PCT/US93/07189, filed July 29, 1993. The pro region represents 
the nucleotide sequence occurring approximately 87 bases 
downstream of the ATG start codon, and continues for about 980 
10 bases. The nucleotide sequence encoding the pro region is highly 
enriched in a "GC" sequence, which well may be competent to form a 
secondary structure (e.g., as part of the mRNA transcript) which 
itself may modulate OP-1 expression. Accordingly, part or all of 
the nucleotide sequence encoding an OP-1 pro region, particularly 
15 that portion corresponding to a GC rich region, may be used, 
preferably in combination with one or more OP-1 non coding 
sequences, in the compositions and methods of the invention. 

In another embodiment, the method can be practiced using a 
cell known to express the CP-1 gene. Suitable DNA sequences for 
20 transfection are described below, as well as suitable cells 
containing transfected DNA sequences. 

In another aspect, the invention provides molecules, 
vectors, methods and kits useful in the design and/or 
identification of OP-1 expression modulating compounds. As used 

25 herein a "kit" comprises a cell transfected with a DNA sequence 

comprising a reporter gene in operative association with a portion 
of OP-1 upstream DNA sequence and the reagents necessary for 
detecting expression of the reporter gene. The portion of OP-1 
upstream DNA chosen can be any of the various portions which have 

30 been described herein. 

Following this disclosure, medium flux screen assays, and 
kits therefore, for identifying OP-1 expression modulating 
compounds are available. These compounds can be naturally 
occurring molecules, or they can be designed and biosynthet ical ly 
35 created using a rational drug design and an established 

structure/function analysis methodology. The compounds can be 
amino acid-based or can be composed in part or whole of 
non-proteinaceous synthetic organic molecules. 
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The OP-1 expression modulating compounds thus identified then 
can be produced in reasonable quantities using standard 
recombinant expression or chemical synthesis technology well known 
and characterized in the art and/or as described herein. For 
5 example, automated means for the chemical synthesis of nucleic and 
amino acid sequences are commercially available. Alternatively, 
promising candidates can be modified using standard biological or 
chemical methodologies to, for example, enhance the binding 
affinity of the compound for a DNA element and the preferred 
10 candidate derivative then can be produced in quantity. 

Once a candidate compound has been identified it can be tested 
for its effect on OP-1 expression. For example, a compound which 
upregulates (increases) the pro-.-ction of OP-1 in a kidney cell 
line is a candidate for systemic administration. The candidate 

15 can be assayed in an animal model to determine the candidate 
molecule's efficacy in vivo. For example, the ability of a 
compound to upregulate levels of circulating OP-1 in vivo can be 
used to correct bone metabolism diseases such as osteoporosis 
(See, for example, PCT/USS2 / 07932 , supra ) . Useful izi vivo animal 

20 models for systemic administration are disclosed in the art and 
below . 

As demonstrated herein below, OP-1 is differentially expressed 
in different cell types. Accordingly, it further is anticipated 
that a candidate compound will have utility as an inducer of OP-1 
25 expression in one cell type but not in another. Thus, the 

invention further contemplates testing a candidate compound for 
its utility in modulating expression of OP-1 In different cells in 
vivo, including different cells knov;n to express OP-1 under native 
physiological conditions. 

30 Thus, in view of this disclosure, one of ordinary skill in 

recombinant DNA techniques can design and construct appropriate 
DNA vectors and transfect cells with appropriate DNA sequences for 
use in the method according to the invention to assay for 
compounds which modulate the expression of OP-1, These identified 

35 compounds can be used to modulate OP-1 production and its 

available concentrations in both in vivo and in vicro contexts. 

Brief Description of the Drawings 
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Fig. 1 shows the alignment of upstream sequences of the murine 
and human OP-1 gene. The murine sequence is present in Che upper 
sequence lines and the human sequence is the lower sequence on all 
lines. The murine sequence is numbered backwards, counting back 
from the first ATG of the translated sequence which is shown 
highlighted. For purposes of alignment, dashes are introduced 
into the DNA sequence, and three portions of human DNA sequence 
have been cut from the sequence and placed underneath a gap, below 
a solid triangle; 

Fig. 2 shows a time course of murine uterus OP-1 mRNA 
regulation by estrogen; and 

Fig, 3a shows a schematic of the 2 kb and 4 kb OP-1 mRNAs , the 
hybridization locations of probes 1 through 7 (indicated by the 
bars under the schematic) . The solid line indicates OP-1 mRNA, 
the * indicate potential poly A signals, the boxes indicate the 
translated portion of OP-l mRNA with the hatched box showing the 
TGF-p -like domain. The dashed lines indicate genomic DNA 
sequences. The arrows mark the locations of the cleavage site for 
OP-1 maturation. 

Fig. 3b. shows a Northern blot hybridization analysis of OP-1 
specific 2 kb and 4 kb mRNAs in murine uterine tissue. Lanes 1 
through 7 correspond to probes 1 through 7 respectively. The 2 kb 
and 4 kb mRNAs are indicated by the 4- and 2-on the left side of 
Fig. 3b, and a 0.24 to 9.49 kb RNA size ladder is indicated by 
dashes to the right of the figure. 



30 



35 



Detailed Description 

As will be more fully described below, we have identified 
regions in the OPl genetic sequence useful in identifying 
molecules capable of modulating OP-1 expression in vivo. Also as 
described herein, we have determined that OP-1 expression in vivo 
can be dependent both on cell type and on the status of the cell 
in a tissue. Specifically, as described herein below, OP-1 
protein expression is differentially regulated in uterine tissue 
depending on the status of the uterine tissue. For example. OP-1 
expression is dramatically down-regulated in uterine mouse tissue 
during pregnancy, whereas it is normally expressed in this tissue 
in virgin mice. Moreover, OP-1 expression in other tissues such 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 9533831A1 I > 



wo 95/33831 




PCTAJS9S/07349 



as renal tissue apparently is unaffected during pregnancy. 
Administration of estrogen to a virgin mouse is capable of 
duplicating this down-regulation of OP-1 gene expression. 

We investigated the DNA sequences responsible for the 
5 regulation of OP-1 gene expression by cloning non-coding sequences 
for the human and mouse OP-1 gene. The tissue specific modulation 
of OP-1 gene expression, and the significant homology which was 
found between an approximately 750 base region of human and murine 
5' non-coding OP-1 genomic sequence, implicate these sequences as 
10 having utility in a method for the screening of compounds for 
their ability Co modulate OP-1 expression. 

In view of this disclosure and the examples provided below, a 
method for identifying molecules v;hich can affect OP-1 expression 
15 in a particular cell type in vivo now is provided. 

Cloning of Human and Mouse OP-1 Gene Non-coding Se quences 

In the Northern blot analysis of murine organs multiple OP-1 
transcripts, are detected namely, three species of 1.8, 2.2, 2.4 

20 kb and a prominent 4 . 0 kb RNA species (Ozkaynak et al . , 1992, 

Biol. Chem. , 2£7 : 25220-25227 ; Ozkaynak et al; Biochem. Biophys . 
Res. Comm. , 179:116-123). The pattern is similar in rats with 
only the 1 . 8 kb species absent. The estrogen-mediated 
dov/nregulation of OP-1 mRNA affects all of these species. In 

25 order to prove that the 4 . 0 kb mRNA is in fact a transcript from 
the same OP-1 locus, cDNA clones were isolated from a mouse 
ceratocarcinoma cDN.A library. 

Four independent clones were obtained that added sequence 
information to the published mouse cDNA sequence. Two of these 

30 CDNA clones have longer 5 ' -untranslated sequences (0.4 and 0.3 kb) 
than previously reported (0.1 kb) . Three of the murine clones 
contain additional 1.4 kb at the 3 '-end. The combined sequences 
add up to a total OP-1 cDNA size of 3,5 kb, about 0. 5 kb shorter 
than the 4 . 0 kb mRNA observed on Northern blots. cDNA clones that 

35 represent the 2 kb and 4 kb messages are shown schematically in 
Figure 3a. Since the polyA-tail is lacking in those cDNA clones 
that extend the 3 ' -information , it was anticipated that missing 
0.5 kb sequence occurs at the 3 '-end. 
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In order to obtain the sequence immediately adjacent to the 
3 ' -end of the 3 . 5 kb cDNA sequence, a mouse genomic library, 
ML1039J (Clontech), was screened with a 3 ' -end cDNA specific probe 
(0.45 kb, 3 ' -end Xmnl-EcoRI fragment of murine DP-1 cDNA) 
according to the parameters described below for the cloning of 
upstream non-coding sequences. This screen yielded four lambda 
clones which were analyzed by Southern blotting. All clones 
yielded a 1 . 5 kb XmnI fragment which was subcloned from lambda 071 
into a Bluescript vector and sequenced. Three polyadenylation 
signals (AATAAA) (Proudfoot et al, (1976) Nature , 263:211-214) 
were found in this genomic fragment, at 3.52-, 3.58-, and 3.59 kb 
(shov/n schematically in Fig. 3a by the *) . The 3 • -end cDNA and 
the genomic DNA sequences in the 1.5 kb XmnI fragment overlap by 
0.4 kb in a region that immediately precedes the second 
polyadenylation signal located at 3.5 kb (Figure 3a, region 
indicated by probe 6) and are in complete agreement within this 
stretch . 

Human upstream non-coding sequence and additional mouse 
upstream non-coding sequence were obtained by screening human and 
mouse genomic libraries, HLlCc7j and ML1030J respectively 
(Clontech) . All libraries were screened by an initial plating of 
750,000 plaques (approximately 50,000 :plaques/plate) . 
Hybridizations were done in 40% formamide, 5 x SSPE, 5 x 
Denhardt's solution, and 0.1% SDS at 37^0. Nonspecific counts 
were removed in 0 . 1 x SSPE, 0 . 1 % SDS by shaking at 50°C. Human 
and mouse upstream genomic DNA sequences were obtained from clones 
lambda 03 and lambda 633, respectively (Clontech, HLi067J and 
ML1030J). These lambda clones were isolated using a ^^P-labeled 
probe made from a human 0.47 kb EcoRI OP-1 cDNA fragment (obtained 
from p6ll5) containing mainly 5' non-coding and exon 1 sequences. 

A 7 kb EcoRI fragment from the human genomic clone, lambda 63, 
was isolated which contains 5 kb of upstream non-coding sequence. 
Additional upstream sequence information for murine v/as obtained 
by subcloning a 1 . 1 kb PstI fragment from the genomic phage clone 
lambda 033. This fragment overlaps with the 5 '-end of the longest 
murine cDNA clone by 0.3 kb in the 5' non-coding region and 
provided 0 . 8 kb additional sequence information, A schematic 
diagram of the 2- and 4 kb OP-l messages is shown in Figure 3a 
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with dashed lines indicating supplementing information derived 
from murine upstream and downstream genomic DNA. 

All sequencing was done according to Sanger et al . (1977) 
Proc. Natl. Acad. Sci. 74:5463-5467, using exonuclease Ill- 
mediated unidirectional deletion (Ozkaynak et al . . (1987) 
BioTechniques, 5:770-773), subcloning of restriction fragments, 
and synthetic primers. Compressions were resolved by performing 
the reactions at 70°C with Taq polymerase and using 7-dea2a-GTP 
(U.S. Biochemical Corp., Cleveland, OH). 



Verification of OP-1 mRNA Sequences by Nort hern Blotting 

To verify the structures of the short and long mRNA species 
observed, Northern blot hybridizations were performed with probes 
made from seven non-overlapping DNA fragments (Fig. 3a; probes 1 
15 through 7) specific to the 5' and 3' non-coding region, the 
protein coding sequence, and genomic regions upstream or 
downstream of the predicted mRNAs . respectively. 

Hybridization of these probes to individual Northern blot 
strips containing mouse kidney mRN.A is consistent with the 

20 predicted 4 kb mRNA structure. As sho'.vn in Fig. 3a, and Fig. 3b, 
the genomic DNA probes 1 and 2 did not hybridize to any message. 
Probe 2 is specific to the upstream sequences immediately adjacent 
to the cDNA. Probes 3, 4, and 5, specific to 5* non-coding, 
coding, and 3' non-coding regions, respectively, hybridized to 

25 both the 2 kb and 4 kb messages, hence these sequences are present 
in both messages. Probe 6. specific to sequences between the 
first and second polyadenylat ion signals, hybridized only to the 4 
kb message. Finally, probe 7 which is specific to sequences 
further dovmstream of the fourth (last) polyadenylat ion signal, 

30 did not hybridize to any message. The results obtained with these 
probes confirm the two OP-1 mRNA structures and the approximate 
5'- and 3 • -end boundaries of OP-1 transcripts shown in Figure 3a. 
This demonstrates that the 2 kb and 4 kb mRNA ' s are from the same 
OP-1 genomic locus rather than from multiple genes. 

35 The extensive 3' sequence included in the 4 kb mRNA transcript 

suggests that the 3' untranslated sequence may play a role in OP-1 
gene expression particularly as it has been detected across 
species namely, in mouse, rat, dog, human and chicken. Multiple 
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stop codons in all three possible translation reading frames rule 
out the likelihood that this sequence encodes a peptide. The 
untranslated sequence itself may act therefore to influence mRNA 
stability.. For example, the sequence may interact with another 
protein as has been described for transferrin receptor mRNA. 
Here, IRE-binding protein (IRE; iron response element) stabilizes 
the transferrin receptor mRNA by binding to the 3 • -end of the mRNA 
(Standard et al . , 1990, Genes Dev. , 4:2157-2168, incorporated 
herein by reference). Alternatively, the 3 ' -end sequences may be 
interacting with the 5 ' -end sequences thereby affecting initiation 
of protein synthesis or, the 3 ' -end sequences may be serving as a 
binding site for other RNAs which can interfere with the binding 
of an expression in modulating molecule, including repressor 
molecule. (Klausner et al . , 1989, Science , 246:870-872; Kozak, 
^^-2' Ann. Rev. Cell Biol . . 8:197-225, incorporated herein by 
reference) . 



20 



25 



30 



35 



Comparison of 5' Non-codina Sequences of Human and Mouse OP-1 PNA 

The cloning of the 5' non-coding genomic murine and human OP-1 
DNA sequences demonstrated that a high degree of sequence homology 
exists between the human and murine 5' non-coding DNA sequences. 
The homology extends from the base immediately upstream of the 
translation start site for the OP-1 morphogen protein to 
approximately 750 bases upstream of the translation start site, as 
is shown in the shaded regions of Fig, 1, with the murine 
sequences being the upper lines and the human sequences being the 
lower lines. The 5' nucleotide of the region of homology for the 
human OP-1 5' non-coding sequence is base 2548 of Seq. ID No, 1 
and for the murine OP-1 5' non-coding sequence is base 1549 of 
Seq. ID No. 2. The significant homology between the human and 
murine 5' non-coding sequences of OP-1 suggest that this region 
may be important in the regulation of OP-1 expression. As will be 
discussed in more detail below, this region contains several 
conserved DNA sequences v;hich have been identified as the DNA 
binding sequences for two DNA binding proteins, Wt-1 and Egr-1, 
which both recognize these DNA sequences. The DNA binding 
sequences for Wt-1 /Egr-1 present in human and murine are marked in 
Fig. 1 with a single line. Also, the TCC binding sequence, a DNA 
binding sequence for Wt-1 and Egr-1, is marked in Fig. 1 by the 
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double line. WT-1 and Egr-1 proteins have also been implicated in 
the regulation of expression of several genes which are unrelated 
to OP-1. 

Alignments of mouse and OP-1 human genomic sequences reveals a 
5 conserved stretch of 0.75 kb just upstreain of the first ATG that 
contains several patterns with marked similarity to the zinc- 
finger protein binding sequence (5'-GCG GGG GCG-3 * ) specific for 
Egr-1 and Wt-1 (Christy et al . , 1989, PNAS. 86:8737-8741; Rauscher 
et al., 1990, Science , 250:1259-1262; Drummond et al . , 1992, 

10 Science , 257:664-678). In mouse, a total of 8, and in human 7. 

patterns, conforming to the degenerate Egr-l/Wt-1 binding sequence 
(5'-GNG NGG GNG-3 * ) (Rupprecht et al . , 1994. J. Biol. Chem., 269: 
6198-6202; Werner et al . , 1994, J. Biol. Chem. , 269: 12940-12946 
are located before and after the presumed transcriptional 

15 initiation site (Fig. 1, shown by solid single lines). The 

presence of these has significance in light of the elevated levels 
of Wt-1 mRNA in the rat uterus decidua during pregnancy (Zhou et 
al., 1993. Differentiation , 54 : 109-114 ) . 

The analysis also revealed, in the human upstream region, a 

20 pattern of seven TCC repeats, present at -561, imrriediately 3' of 
two Egr/Wt-1 sequences (at -624 and -587) (Figure 1, shown by 
double solid lines and at position 2758-2778 of Seq. ID No. 1). 
The mouse upstream region contains a similar, albeit less obvious 
sequence at -356 and at position 1755-1769 of Seq. ID No. 2. This 

25 TCC-repeat pattern is found in the promoters of PDGF-A and several 
other growth-related genes, and Wc-1 has been found to activate 
transcription when either of the sequences are present and to 
suppress it when both sequences are present. (Wang et al . , 1992, 
Biochem. Biophys Res. Comm., 188:433-439; Wang et al . 1993. PNAS, 

30 90:8896-8900 incorporated herein by reference). Accordingly, 
estrogen receptor may exert its effect on OP-1 expression in 
uterus by upregulating Wt-1. either directly or indirectly. 
Alternatively or, in addition other regulatory elements, located 
further upstream of the OP-1 gene may be involved in estrogen 

35 regulation. 

Also on Fig. 1, the human 5* non-coding DNA sequence is shown 
to contain a Fushi-tarazu (FTZ) binding sequence which is marked 
by carats below the human DNA sequence- A FTZ binding sequence is 
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bound by the Fushi-tarazu protein (FT2-F1), which is a member of 
the superfamily of nuclear receptors (Parker. (1993) Current 
opinion in Cell Biology. 5:499-504, ). The superfamily of nuclear 
receptor proteins include steroid hormones, retinoids, thyroid 
hormone, nerve growth factor and Fushi-tarazu, and are 
structurally related. FTZ-Fl is likely to belong to a subfamily 
of nuclear receptors that bind DNA as monomers. 



The FTZ-Fl protein is a positive regulator at the fushi-tarazu 
gene in blastoderm stage embryos of Drosophila . FTZ-Fl is 
closely related in the silkworm (Bombyx) BmFTZ-Fl protein and the 
mouse embryonal long terminal repeat binding protein (ELP) and all 
of them are members of the nuclear hormone receptor superfamily, 
which recognizes the same S base pair sequence, 5 • -PyCA.z^GGPy CPU- 
S'. The FTZ binding sequence does not apparently have a direct or 
15 inverted repeat. In contrast, other members of the nuclear 

hormone receptor superfamily usually bind to repeated sequences. 
Nevertheless, the FT2-F1, BmFTZ-Fl and ELP proteins have high 
affinities for the FTZ binding site DNA, indicating that the 
mechanism that the binding is somewhat different from that of 
other members of the nuclear hormone receptor superfamily. 
(Hitachi et al., 1992, Hoi, and Cell Biology December, pp. 5667- 
5672.), 



The mRNA transcription initiation site for human OP-1 is 
marked on Fig. l by the upward arrow, and the OP-1 protein 
translation initiation site is marked on Fig. 1 by the solid 
triangles just prior to the highlighted ATG. The transcription 
initiation site for the human OP-1 gene is at base 2790 of Seq. ID 
No. 1 and the analogous site for murine is at base 1788 of Seq. ID 
No. 2. The translation initiation site for the human OP-1 gene is 
at base 3318 of Seq, ID No. 1 and for murine it is at base 2296 of 
Seq. ID No. 2. The high degree of identity that the murine and 
human DNA sequences share in the region between the transcription 
initiation site and the translation initiation site, suggests that 
this region likely plays a role in the modulation of the 
expression of the OP-I gene product. 

Analysis of op-i Gene Expression in Mouse Tissues 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCIDkWO 9533831 Al I > 



wo 95/33831 





PCT/US95/07349 



- 21 - 



A detailed analysis of the uro-genital tract of rats has 
revealed OP-1 mRNA expression in the renal (kidney), and bladder 
tissues, as well as at other sites of the urogenital organ system. 
The most abundant levels are present in renal and uterine tissue 
5 (8 week old mice), while much lower levels were found in ovaries. 
The mRNA level of G3DPH, a -housekeeping function- molecule, was 
used as an internal control for recovery and quality of mRNA 
preparations and equal amounts of poly (A) + RNA (Smg) , were loaded 
into each lane. 

10 Preparation of RNA and Northern blot hybridization analysis 

was conducted as follows. 8-week-old female mice, strain CD-I, 
were obtained from Charles River Laboratories, Wilmington, MA. 
Total RNA, from the various organs of mice was prepared using the 
acid-guanidine thiocyanate-phenol -chloroform method (Chomczynski 

15 et al., (1987) Anal. Biochem. 162:156-159). The RNA was dissolved 
in TES buffer (10 rrM Tris-HCl, 1 ml-I Na—EDTA, 0.1% SDS, pH7 . 5 ) 
containing Proteinase K (Stratagene, La Jolla, CA; approx- 1 mg 
proteinase /ml TES) and incubated at 37<=C for 1 hr . Poly (A)-" RNA 
v;as selected in a batch procedure on ol igo (dT) -eel lulose 

20 (Stratagene. La Jolla, CA) in 0.5 M NaCl, 10 mM Tris-HCl. 1 mM 

Na.-EDTA, pH 7.4 (1 x binding buffer). For the selection of poly 
(A)+ RNA, total RNA obtained from 1 g of tissue was mixed with 
approximately 0 . Ig of ol igo (dT) -cellulose (in 11 ml TES containing 
0.5 M NaCl). The tubes containing the RNA and ol igo (dT) -cellulose 

25 were gently shaken for approx. 2 hrs . Thereafter, the oligo(dT)- 
cellulose was washed twice in Ix binding buffer and once in 0 . 5x 
binding buffer (0.25 M NaCl, 10 mM Tris-HCl, 1 ml^I Na_-EDTA, pK 
7.4) and poly (A)+ RNA v;as eluted with water and precipitated with 
ethanol . 

30 Poly (A) + RNA (5 mg per lane) was electrophoresed on 1.2% 

agarose- formaldehyde gels with 1 mg of 400 ng/ml ethidium bromide 
added to each sample prior to heat denaturation (Rosen et al . , 
(1990) Focus, 12:23-24) . Electrophoresis v/as performed at 100 
Volts with continuous circulation of the 1 x MOPS buffer (Ausubel 

35 et al., eds . , (1990) Current Protocols in Molecular Biology , John 
Wiley t Sons, New York). Following electrophoresis, the gels were 
photographed, rinsed briefly in water, and blotted overnight onto 
Nytran (Schleicher u Schuell Inc., Keene, NH) or Duralon-UV 
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(Stracagene) membranes in 10 x SSC, The membranes were dried at 
80*^ for 30 min. and irradiated with UV light (1 mwVcm^ for 25 
sec . ) . 

32 

The P- labeled probe was made from a murine OP-1 cDNA 
5 fragment (0.68 kb BstXI-BGlI frg.) by random hexanucleot ide 

priming (Feinberg et al . , (1984) Anal. Biochem. , r37 : 266-267 } . 
The hybridizations were done in 40% formamide, 5x SSPE, 5x 
Denhardfs, 0.1% SDS, pH 7.5 at 37*^0 overnight. The non-specific 
counts were washed off by shaking in 0 . Ix SSPE, 0.1% SDS at 50°C. 
10 For re-use, filters were stripped in 1 mM Tris-HCl, 1 mM Na—EDTA, 
0.1% SDS, pH 7,5 at 80° c for 10 min. 

Analysis of OP-1 Expression During Pregnancy in Mice 

An examination of the effect ot pregnancy upon CP-i exp-ression 
v;as undertaken by measuring OP-1 mRNA levels in kidney, ovary and 
uterus, before, during, and after pregnancy (virgins, 2-day post- 
coital (pc), 4-day pc , 6-day pc, 8-day pc, 13 day pc, 17-day pc, 
3-day lactating, and retired breeders) by Northern blot 
hybridization of poly (A) + RNA . These measurements demonstrated 
that, while kidneys show no pregnancy-related changes in OP-1 mRNA 
levels, the uterine levels becairie nearly undetectable by 6-day pc. 
However, no changes were observed in the ovaries. A dramatic and 
rapid decline in OP-1 message in uterine tissue between day 3 and 
4 of pregnancy is apparent in the comparison with virgin animals. 

The levels of OP-1 mRNA in the embryo and maternal levels in 
uterus of 8 week old mice at day 13 and 16 of the pregnancy were 
also compared. While the OP-1 expression in the pregnant uterus 
is dramatically reduced, high levels of OP-1 message are found in 
the mouse embryo at 13- and 16-days, Thus, at a stage of 
pregnancy when OP-1 mRNA expression in the maternal uterus is 
almost undetectable, embryonal OP-1 expression is high. The high 
embryonal OP-1 expression also is detected consistent with the 
relatively high levels of OP-1 mRNA, found in human placenta. The 
level of OP-1 mRNA measured in the embryo is in the same range as 
that measured in adult kidney or virgin uterus tissue. Hence, it 
is likely that OP-1 plays a critical role in the development of 
the embryo which may require appropriate amounts of OP-1 at very 
specific stages of tissue and organ morphogensis . While not being 
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limited to any given theory, it is possible that OP-1 expression 
in uterine tissue during pregnancy potentially could interfere 
with the level of OP-1 produced by the developing embryo, and 
thereby interfere with proper development of the embryo. 
5 Therefore, a shut-down or inhibition of uterine OP-1 expression 
during pregnancy might be for the benefit of the fetus. 

Effect of Estrogen and Progesterone on OP-1 Expression 

During pregnancy the estrogen and progesterone levels increase 
10 many fold and high levels are sustained until birth. To determine 
whether these hormonal changes are responsible for the altered OP- 
1 transcription in pregnant uterine tissue, non-pregnant female 
mice were subcutaneously administered 17p-estradiol , or 
progesterone, or a combination of both. 

15 In the first experiment the rapid increase in estrogen and 

progesterone levels during pregnancy was simulated. Non-pregnant 
mice were injected subcutaneously on four consecutive days with 
increasing doses, starting v;ith 20 mg 17p-estradiol , ^or 100 mg 
progesterone or the combination of both and doubling the dose on 

20 each following day. On the fourth day the animals were sacrificed 
and mRNA was isolated from uteri and kidneys. A striking negative 
effect of 17p-estradiol on the uterine OP-1 mRNA expression was 
observed, but no effect by progesterone was seen. In the kidneys, 
hov/ever, mRNA levels did not change after 17p-estradiol or 

25 progesterone treatment . 

Another experiment addressed the time course: 17p-estradiol 
was administered to virgin female mice at a constant dose of 200 
mg (50 ml of 4 mg/ml 17p-estradiol per day, subcutaneously in DMSO 
[dimethyl sulfoxide] + 150 ml 150 mM NaCl) (Figure 2). Following 

30 this, their uteri were extracted, poly (A) + RNA was prepared, equal 
amounts of poly (A) + RNA (5 mg) was loaded into each lane of a 1.2% 
agarose- formaldehyde gel and analyzed by Northern blot 
hybridization. The effect was rapid, with considerable decrease 
of OP-1 mRNA 12 hours after administration of 17p-estradiol and 

35 almost undetectable levels by 48 hours, as shown in Fig. 2. In 

the figure, the lanes correspond as follows: from left to right, 
O-day (negative control), 0-day (negative control), 0.5-, 1-, 2-, 
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3-, 4", 5-, 6-, and 8-days. The arrowheads mark Che two major 

OP-1 mRNA species. A modest amount of message reappears a few 
days later (Figure 2) . 

The uterus has been identified as a major site of OP-1 
expression. The level of OP-1 expression in uterine tissue is 
comparable to that observed in renal tissue. However, during 
pregnancy, by day four, the uterine OP-1 mRNA levels are reduced 
to the limit of detection. The loss of OP-1 expression 
corresponds withalso is rising levels of estrogen during this same 
time frame. The same dramatic loss of uterine OP-1 message also 
is observed in estrogen- treated animals, suggesting that estrogen 
is involved in negative regulation of OP-1 expression in uterine 
tissue. The effect of estrogen is rapid, with most of the message 
disappearing after 12 hours of 17p-estradiol administration. The 
reappearance of some OP-1 message at later days may be due to a 
counter-regulatory mechanism. In contrast to the modulated OP-l 
mRNA levels in the uterus, no substantial changes occur in renal 
tissue during pregnancy or in response to estrogen treatment. 
Therefore, OP-1 mRNA expression in these different organs is 
regulated independently. The differential expression may be due, 
for example, to a lack of estrogen receptors in renal tissue. 
Alternatively, co-regulation by means of one or more accessory 
molecules that interact with estrogen or a related nuclear 
receptor molecule (s) may allow for the independent regulation. 
For example, each of Wt-1 protein (which binds to the v:t-l/Egr-l 
element) and OP-1 protein are required for normal kidney 
development, and each are expressed at high levels during kidney 
tissue development. As described above the OP-1 promoter region 
contains Wt-1 consensus binding elements. wt-1 protein also has 
been shown to negatively regulate the transcription of the insulin 
growth factor II gene and the platelet-derived growth factor A 
chain gene. Kreidberg et al . , Cell, 1993, 74:679-691. Without 
being limited to a given theory, it may be that wt-1 protein, 
either alone or in combination with one or more molecules is 
involved in the expression of oP-1. For example, wt-1 protein may 
act in concert with a nuclear hormone receptor element, 
including, for example, the estrogen receptor element. 
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Implications of Tissue Specific Differential Regula tion of OP- 
1 Expression 

Estrogen also has been shown to inhibit the uterine expression 
of calbindin-Djg^, a vitamin D dependent calcium binding protein, 
5 the a-subunit expression of the glycoprotein hormones, and other 
proteins involved in bone formation. Estrogen also has been shown 
to cause dramatic decreases in the steady state raRNA levels of 
the bone matrix proteins osteocalcin, prepro a2(I) chain type I 
collagen, osteonectin, osteopontin, and alkaline phosphatase in an 
10 ovariectomized rat, which is a rat model for osteoporosis. 

Estrogen appears to mediate its beneficial effect on bone 
metabolism in the osteoporotic model through inhibition of 
osteoclasts. Estrogen does not reverse osteoporosis. By 
contrast, OP-1, which is expressed in uterine, renal and bone 
15 tissues, is able to induce an increase in bone mass in the 

osteoporotic model. Thus, the negative effect of estrogen on OP-1 
expression in uterine tissue may seem unexpected in view of 
estrogen's effect on bone metabolism. 

In addition to the 5' non-coding DN.^ sequences of OP-1, the 
20 other non-coding sequences such as introns and 3 ' non-coding 
sequences may be involved in the modulation of OP-1 protein 
expression. This invention presents a method in which these non- 
coding sequences are assayed while in operative association with a 
reporter gene for their influence on the expression of OP-1. Non- 
25 coding sequences which are involved in the modulation of OP-1 

expression will be identified by culturing cells transfected with 
the non-coding sequences, in operative association with a reporter 
gene, with one or more compound (s), measuring the level of 
reporter gene expression, and comparing this level of expression 
30 to the level of reporter gene expression in the absence of the 
compound (s ) . 

EXEMPLARY CELLS, VECTORS, REPORTER GENES AND ASSAYS FOR USE IN 
SCREENING COMPOUNDS WHICH MODULATE OP-1 REGULATORY SEQUENCES 

35 

I. Useful Cells 

Any eukaryotic cell, including an immortalized cell line 
suitable for long term culturing conditions is contemplated to be 
useful for the method and cell of the invention. Useful cells 
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should be easy Co transfecc, are capable of stably maintaining 
foreign DNA with an unrearranged sequence, and have the necessary 
cellular components for efficient transcription and translation of 
the protein, including any elements required for post- 
translational modification and secretion, if necessary. Where the 
cell is to be transfected with a non-dominating selection gene, 
the cell genotype preferably is deficient for the endogenous 
selection gene. Preferably, the cell line also has simple media 
composition requirements, and rapid generation times. 
Particularly useful cell lines are mammalian cell lines, including 
myeloma, HeLa, fibroblast, embryonic and various tissue cell 
lines, e.g., kidney, liver, lung and the like. A large number of 
cell lines now are available through the American Type Culture 
Collection (Rockville, MD) or through the European Collection of 
Animal Cell Cultures (Porton Dov;n, Salisbury, SP4 OJG, U.K.) 

VJhere, as here, the expression of a reporter gene that is 
controlled by non-coding sequences of the morphogen OP-1 is to be 
analyzed, particularly useful cells and cell . lines are envisioned 
to include eukaryotic, preferably mammalian cells of a tissue and 
cell type known to express OP-1 and/or closely related proteins. 
Such cells, include, without limitation, cells of uro-genital cell 
origin, including kidney, bladder and ovary celf.s, lung, liver, 
mammary gland and cardiac cells, cells of gonadal origin, cells of 
gastrointestinal origin, glial cells and other cell lines known to 
express endogenous genes encoding morphogenic proteins. Preferred 
cell lines are of epithelial origin. 

II. Exemplary Vectors/Vector Construction Considerations 

Useful vectors for use in the invention include, but are not 
limited to cosmids, phagemids, yeast artificial chromosomes or 
other large vectors. Vectors that can be maintained within the 
nucleus or integrated into the genome by homologous recombination 
are also useful. For example a vector such as PSV2CAT would be 
useful . 

Selected portions of non-coding OP-1 sequence can be cloned 
into a useful vector using standard molecular cloning techniques, 
as will be apparent to one of ordinary skill in the art. 
Restriction endonuclease sites will be utilized when possible, and 
can be engineered into the sequence when needed. if restriction 
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endonuclease sites are needed co be engineered into the sequence, 
eight base recognition sites are preferable because they generally 
occur infrequently in DNA and will enhance a practitioners ability 
to obtain the sequence of interest. Restriction endonuclease 
5 sites can be engineered into the non-coding sequence using the 
common techniques such as site directed mutagenesis and PCR with 
primers including the desired restriction endonuclease site. 

As discussed above, murine and human OP-1 sequences share a 
region of high homology covering approximately 7 50 bases upstream 

10 of the translation initiation site as shown by the shading in Fig. 
1. This region is positions 2548-3317 of Seq. ID No. 1 and 
positions 1549-2296 of Seq. ID No. 2. The mRNA transcription 
initiation site lies within this region at position 2790 of Seq. 
ID No. 1 and by analogy at position 1788 of Seq. ID No, 2, shown 

15 in Fig. 1 by the upward arrow. This suggests that positions 2548- 
2790 of Seq. ID No. 1 and 1549-1788 of Seq. ID No. 2 contain 
conserved promoter elements for the expression of OP-1 mRNA, and 
approximately 500 bases at positions 2791-3317 of Seq. ID No. 1 
and positions 1790-2296 of Seq. ID No. 2 contain conserved 

20 elements of the transcribed, but not translated, sequences all or 
part of which may be involved in the regulation of OP-1 
expression. Additionally sequences upscreairi of the homology 
region may also be involved in the regulation of OP-1 expression. 
Thus a range of upstream sequences, including sequences upstream 

25 of the transcription initiation site and noc including the 

approximately 500 bases of transcribed sequence, can be fused in 
operative association with a reporter gene to modulate expression 
of the gene. 

3 • non-coding sequences and intron sequences also can be fused 
30 in operative association with a reporter gene, either separately 

or in combination with each other or with 5* non-coding sequences. 
For example, one can place the 5' sequences defined by positions 
2790-3317; 2548-2790 or 2548-3317 of Seq. ID No. 1, and 
either/both of 3' sequences or intron sequences in operative 
35 association with a reporter gene. The positions of the six 

introns are shown in Seq. ID No. 1 as bases 373 6 to 10700; bases 
10897 to 11063; bases 11217 to 11424; bases 11623 to 13358; bases 
13440 to 10548; bases 15166 to 17250; 
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Also envisioned is a nucleic acid construcc comprising a small 
fragment of 5' non-coding OP-1 sequence in combination with 
additional conserved elements such as one or more Wt-l/Egr-1 
binding sequences; a TCC binding sequence and/or a FTZ binding 
sequence in operative association with a reporter gene. Such a 
nucleic acid construct also could include intron sequences and/or 
3 ' non-coding sequences . 

A range of useful 5' non-coding fragments has been provided, 
and as will be apparent to those of ordinary skill in the art, 
smaller fragments of OP-1 sequence also are useful. Such smaller 
fragments can be identified to deleting bases from one or both 
ends of the provided 5* non-coding fragments, using techniques 
that are well known in the art and testing the truncated 
constructs for their ability to modulate reporter gene expression. 
In this way, the shortest modulating sequences can be identified. 



20 



25 



30 



35 



III. Transfection Considerations 

Any method for incorporating nucleic acids into cells of 
interest is contemplated in the method of the invention. Calcium 
phosphate (CaPOJ , follov/ed by glycerol shock is a standard means 
used in the art for introducing vectors, particularly plasmid DNA 
into mam.malian cells. A representative method is disclosed in 
Cockett et al . . (1990) Biotechnology 8: 662-667, incorporated' 
herein by reference. Other methods that may be used include 
electroporation, protoplast fusion, particularly useful in myeloma 
transf ections , microinjections, lipofections and DEAE-dextran 
mediated uptake. Methods for these procedures are described in 
F.M. AusubGl, ed.. Current Protocols in Molecular Biology , John 
Iviley £c Sons, New York (1989), 

As will be appreciated by those having skill in the art, 
optimal DNA concentrations per transfection will vary according to 
the transfection protocol. For calcium phosphate transfection, 
for example, preferably 5-10 pg plasmid DNA per plasmid type is 
transfected. In addition, the DNA to be transfected preferably is 
essentially free of contaminants that may interfere with DNA 
incorporation, A standard means used in the art for purifying DNA 
is by ethidium bromide banding. 
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IV. Exemplary Reporter Genes 

There are numerous reporter systems commercially available, 
which include, without limitation, the chloramphenicol 
acetyltransferase (CAT), luciferase, GAL4 , and the human growth 
5 hormone (hCH) assay systems. 

CAT is a well characterized and frequently used reporter 
system and a" major advantage of this system is that it is an 
extensively validated and widely accepted measure of promoter 
activity. See, for example, Gorman, CM., Moffat, L.F., and 

10 Howard, B.H. (1982) Mol . Cell. Biol. , 2:1044-1051 for a 

description of the reporter gene and general methodology. In this 
system cells are harvested 2-3 days after transfection with CAT 
expression vectors and extracts prepared. The extracts are 
incubated with acetyl CoA and radioactive chloramphenicol. 

15 Following the incubation acetylated chloramphenicol is separated 
from nonacetylated form by thin layer chromatography. In this 
assay the degree of acetylation reflects the CAT gene activity 
with the particular promoter. 

Another well-recognized reporter system is the firefly 
20 lucif erase reporter system. See, for example Gould, S.J., and 

Subramani, S. (1988) Anal. Biochem. . 7:404-408 for a description 
of the reporter gene and general methodology. The lucif erase 
assay is fast and has increased sensitivity. The system also is 
particularly useful in bulk transf ect ions or if the promoter of 
25 interest is weak. In this assay transfecced cells are grown under 
standard conditions, and when cultured under assay conditions both 
ATP and the substrate lucif erin is added to the cell lysate . The 
enzyme lucif erase catalyzes a rapid, ATP dependent oxidation of 
the substrate which then emits light. The total light output is 
30 measured using a luminometer according to manufacturer's 

instructions (e.g., Cromega) and is proportional to the amount of 
lucif erase present over a wide range of enzyme concentrations. 

A third reporter system is based on immunologic detection of 
hGH, it is quick and easy to use. (Selden, R., Burke-Howie, K. 
35 Rowe, M.E., Goodman, H.M. , and Moore, D.D. (1986), Mol. Cell. 
Biol . , 6:3173-3179 incorporated herein by reference). hGH is 
assayed in the media, rather than in cell extracts. This allows 
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direct monitoring over by a single population of transfected cells 
over time. 

As indicated above and as will be appreciated by those having 
ordinary skill in the art, particular details of the conventional 
5 means for transf ection, expression, and assay of recombinant genes 
are well documented in the art and are understood by those having 
ordinary skill in the art. The instant invention enables and 
discloses vectors, cells and a method for screening compounds to 
determine the capability of compounds to modulate the expression 
10 of OP-1 via the non-coding sequences of the OP-1 genomic DNA. 

Further details on the various technical aspects of each of 
the steps used in recombinant production of foreign genes in 
mammalian expression systems can be found in a number of texts and 
laboratory manuals in the art, such as, for example, F.M. Ausubel 
15 et al . , Ed., Current Protocols in Molecular Biology , John Wiley u 
Sons, New York, (1989). 



VIII. Exemplary Homologous/Non-Homciogous Recom.binat ion 

One approach to screen for inducers of (organ-specific) OP-1 
20 expression in a particular cell line derived from a particular 

tissue such as renal or uterine tissue, is through gene targeting 
by homologous recombination (Sedivy et al . , w.H. Freeman U Co , , 
New York (1992); A.S. Waldman, Crit. Rev. Oncol. Hematol . 12, 49 
(1992)). In one strategy the endogenous (genomic) OP-1 gene is 
25 replaced by another reporter gene which is optimally suited for 

screening assays, such as the firefly luciferase gene. To target 
the OP-1 gene in an appropriate cell line, e.g., a kidney cell 
line or NBT-2, the follov;ing arrangement of genetic elements can 
be assembled. 

30 Genomic OP-1 upstream and promoter sequences preferably 3000 

to 5000 nucleotides in length, and which mediate the homologous 
recombination, are attached to the luciferase gene. The OP-1 
upstream sequences down to the first coding ATG can be attached at 
the start codon ATG of the luciferase coding sequence, using a 

35 restriction site such as Ncol, which can be introduced by site 
directed mutagenesis into both the promoter and the luciferase 
sequences . 
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Also included is a selective marker, preferably the neo gene, 
without its own promoter. Preferably, selectable marker (neo) is 
placed downstream of the reporter gene (luciferase) , after an 
intercistronic sequence derived from the polibvirus genome and 
5 which allows translation of the sequence marker on the same 
transcript as the reporter gene transcripts. Details of this 
approach, including specific intercistronic sequences and the 
detailed steps of homologous recombination, are described m the 
art. including (Jasin et al . . PNAS USA 85:8583 (1988); Sedivy et 
10 al.. PNAS USA 86. 227 (1989); Dorin et al . . Science 243:1357 
(1989) the disclosures of which are incorporated herein by 
reference. As described therein, the endogenes OP-1 gene is 
replaced by the luciferase and neo coding sequences and the 
expression of these sequences then asayed in a standard A 
15 screening protocol. 

A aenetic arrangement of OP-1 promoter (as much genomic OP-1 
upstream sequence as possible, up to 10.000 bp) and reporter gene 
(Without its original promoter but joined directly to the OP-1 ATG 
or in its vicinity) can also be introduced into cells on standara 
20 eukaryotic expression vectors. These vectors carry selectable 
m,arkers (neo, dhfr, etc.) and will typically be integrated into 
the host genome with variable copy number ranging from one to 
several cooies without efforts at amplification. Also, if 
desired, the vector or gene copy number can be enhanced using a 
25 well characterized amplifiable gene, such as dhfr in conjunction 
with methotrexate. Commercial vectors designed for autonomous 
reolication without integration are readily available. One source 
vector is the Episomal Expression Epstein Barr Virus Vector (pREP. 
Invitrogen Corp., San Diego CA) . 
30 introns also can be tested for regulatory sequences as 

described hereinabove using the methods described herein. One or 
more intron sequences derived from a genomic OP-1 locus preferably 
is introduced into proper mammalian cells using, for example, a 
yeast artificial chromosome (pYACneo, Clontech, Inc. Palo Alto. 
35 CA) (Ref. Albertson, H.M. et al . PNAS USA, 87:4256. 1990), or 

other vectors adapted to allow transfer of large sequences, e.g.. 
up to 1 megabases. As for the OP-1 5' or 3 ' noncoding sequences ■ 
described above, the intron sequence or a portion thereof is 
incorporated in operative association with a reporter gene and the 
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ability of the sequence to modulate reporter gene expressions then 
associated . 
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X. Exemplary Screening Assay for Compounds which Alter OP-1 Gene 
or Reporter Gene Levels 

Candidate compound (s) which may be administered to affect the 
level of a given endogenous morphogen, such as OP-1, or a reporter 
gene that is fused to OP-1 non-coding sequence may be found using 
the following screening assay, in which the level of reporter gene 
production by a cell type which produces measurable levels of the 
reporter gene expression product by incubating the cell in culture 
with and without the candidate compound, in order to assess the 
effects of the compound on the cell. This can be accomplished by 
detection of the reporter expression product either at the protein 
or RNA level. The protocol is based on a procedure for 
identifying compounds which alter endogenous levels of morphogen 
expression, a detailed description also may be found in PCT US 
92/07359 . 

Cultured cells are transfected with portions of OP-1 non- 
coding sequences in operative association with a reporter gene,' 
and such transfected cells are ir^aintained with the vector 
remaining as a plasmid in the cell nucleus or the vector can be 
integrated into the host cell genome, preferably at the OP-i 
genomic locus . 

Cell samples for testing the level of reporter gene expression 
are collected periodically and evaluated for reporter gene 
expression using the appropriate assay for the given reporter gene 
as indicated in the section describing reporter gene assays, or, 
alternatively, a portion of the cell culture itself can be 
collected periodically and used to prepare polyA(4) rna for mRNA 
analysis . 

Once candidate compounds are identified, they can be 
produced in reasonable, useful quantities using standard 
methodologies known in the art. Amino acid-based molecules can be 
encoded by synthetic nucleic acid molecules, and expressed in a 
recombinant expression system as described herein above or in the 
art. Alternatively, such molecules can be chemically synthesized, 
e.g., by means of an automated peptide synthesizer, for example. 
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Non-amino acid-based molecules can be produced by standard organic 
chemical synthesis procedures. 

Provided below is an exemplary protocol for carrying out the 
method of the invention, using the CAT gene as the reporter gene 
5 and one or more mammalian cell lines known to express OP-1. The 
example is non limiting, and other cells, reporter genes and OP-1 
non-coding sequences are envisioned. 

Exemplary Construction Of Representative Vectors For Transf ections 

A DNA fragment containing the OP-1 promoter can be joined to a 
10 reporter gene for transfection into a cell line that expresses 
endogenous OP-1. Suitable cell lines are selected by Northern 
blot hybridization to an OP-1 specific probe (by analyzing the 
cell extracts for OP-1 mRNA) . Using this technology we have found 
several cell lines which make high levels of OP-1 mRNA. and some 
15 of these lines are the kidney line IMCD, the bladder line NBT II. 

An approximately 5 Kb EcoRI , BamHI genomic fragment containing 
approximately 4 Kb of upstream OP-1 sequences as well as part of 
the first intron is blunt -ended with T4 DN.-. polymerase and cloned 
into a polylinker of a pUC vector {p0146-l). An approximately 3.5 
20 kb DNA fragment containing human OP-1 upstream sequences is 

obtained by delisting a portion of coding sequences and the first 

intron from p0146-l with the restriction enzyme Ehel . The -3.5kb 
fragment has blunt ends and contains mostly 5' non-coding 
sequences and also includes a- short stretch of 30 bases into the 

25 0?-l gene. This upstream fragment is of ~3.5kb ligated to a 1.6 
kb Hindlll-BamHI fragment from the CAT gene obtained from the 
vector SV2CAT by 5' Hindlll end blunted ligation. The 1 . 6kb CAT 
gene fragment contains about 70 bases of upstream sequences. 
These ligated fragments are cloned into Bluescript KS(-) vector 

30 (Stratgene, La Jolla, CA) . This construct in turn is subjected to 
site specific mutagenesis to delete the extra sequences 
(approximately 30 bases) from the 3' end of the OP-1 upstream 
sequences and the adjacent 5' non-coding sequences (approximately 
70 bases) from the CAT gene. This mutagenesis results in the 

35 elimination of any OP-1 coding sequences from the promoter 

fragment as well as any non-coding sequences upstream of the CAT 
gene. Thus the resulting construct is a fusion of OP-1 upstream 
sequences with the CAT gene sequences which encode the CAT 
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protein. This approximately 5 kb fragment is then excised from 
Bluescript using Hindlll and BamHI and ligated into a Hindlll- 
BamHI cut and gel purified back-bone of the pSV2CAT vector, for 
transfection into suitable cell lines. 

5 Suitable cell lines include cell lines that have been shown to 

contain high levels of OP-1 mRNA, indicating that the OP-1 
promoter is active in the cells. Two of these cell lines are 
mouse inner medullary collecting duct (IMCD) cells, and the rat 
bladder carcinoma line (NBT II) . However other cell lines of the 
10 uro-genital system that produce high levels of the OP-1 message 
can be used in addition to the many previously mentioned cell 
types and cell lines. 

The transfection of this vector into an OP-1 producing cell 
line is accomplished following standard techniques, i.e., 
15 transfection using calcium phosphate, liposome mediated 

transfection, electroporat ion , or DEAE-dextran transfection. 

The transfected cells are harvested 48-72 hours after 
transfection with the CAT expression vector and extracts are made 
by successive f reeze-thawing . 2 ^1 of 200 (iCi/ml 14C- 

20 choramphenicol (35 to 55 mCi/mmol}, 20 p.1 of 4 mM acetyl CoA, 32.5 
M.1 of 1 M Tris-HCl, pH 7.5, and 75,5 nl of water is added to 20 ml 
of cell extract, and incubated for 1 hour at 37 degrees Celsius. 
Upon completion of incubation, 1 ml ethyl acetate is added to the 
reaction, microcentrif uged for 1 minute and the top layer is 

25 removed. This top layer is dried down in a SpeedVac for 4 5 

minutes, and each sample is resuspended in 30 ml of ethyl acetate. 
The samples are spotted onto a plastic-backed TLC sheet for 
chromatography. The thin layer is then developed in a tank 
containing 200 ml of 19:1 chlorof orm/methanol . The chromatography 

30 is run for 2 hours and placed under film for autoradiography. The 

activity of the C^^ in the monoacetylated chloramphenicol series 
is calculated as described in Current Protocols in Molecular 
Biology, 1993 (Ausubel et al . , eds . John Wiley U Sons, New York). 

Upon determination of CAT activity, the main construct can be 
35 deleted in sections to determine the regions that are responsible 
for the observed CAT activity. Alternatively, the upstream 
sequences can be deleted unidirectional ly , using an exonuclease 
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such as Bal31, and the deletion product can be analyzed in the CAT 
activity assay. This system can also be used in the method of the 
invention to screen compounds for their ability to modulate OP-1 
expression by dividing the cells into several groups, and 
5 culturing one group in the absence of any added compounds, and 
culturing the other groups with one or more candidate compound, 
and comparing the resulting levels of CAT activity. 

While a readily assayable, well characterized, non OP-1 
reporter gene is preferred in the method disclosed herein, as will 

10 be appreciated by those having ordinary skill in the art, OP-1 
coding sequence also may be used in the screening method of the 
invention. The OP-1 expression preferably is determined by an 
im.Tiunoassay or by Northern or dot bloc or other means for 
measuring mRNA transcript. See, for example, WO 95/11983, 

15 published May 4, 1995 for a detailed description on assaying 
changes in OP-1 levels in a cell or fluid. 

XI. Exemplary Screening Assay for Compounds which Alter OP-1 
Gene Expression in 

Endogenous Cell Type Models, 

20 OP-1 is expressed in a variety of different cell types, 

including renal, bone, lung, heart, uterine, cardiac and neural 
tissue. Candidate compounds can be identified which have a 
modulating effect on cells of one tissue type but not another, 
and/or wherein the effect is modulated in the different cells. 

25 The assay described belov; can be used to evaluate the effect of a 
candidate compound <s) in a particular cell type known to express 
OP-1 under physiological conditions. 

Cell cultures of kidney, adrenals, urinary bladder, brain, 
or other organs, may be prepared as described v;idely in the 

30 literature. For example, kidneys may be explanted from neonatal 
or new born or young or adult rodents (mouse or rat) and used in 
organ culture as whole or sliced (1-4 mm) tissues. Primary tissue 
cultures and established cell lines, also derived from kidney, 
adrenals, urinary, bladder, brain, mammary, or other tissues may 

35 be established in multiwell plates (6 well or 24 well) according 
to conventional cell culture techniques, and are cultured in the 
absence or presence of serum for a period of time (1-7 days). 
Cells may be cultured, for example, in Dulbecco's Modified Eagle 
medium (Gibco, Long Island, NY) containing serum (e.g., fetal calf 
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serum at 1%-10%, Gibco) or in serum-deprived medium, as desired, 
or in defined medium (e.g., containing insulin, transferrin, 
glucose, albumin, or other growth factors) . 

Samples for testing the level of OP-1 production includes 
5 culture supernatants or cell lysates, collected periodically and 
evaluated for OP-1 production by immunoblot analysis (Sambrook et 
al., eds., 1989, Molecular Cloning, Cold Spring Harbor Press, Cold 
Spring Harbor, NY), or a portion of the cell culture itself, 
collected periodically and used to prepare polyA+ KNA for RNA 
analysis. To monitor de novo OP-1 synthesis, some cultures are 
labeled according to conventional procedures v;ith an ^^S- 
methionine/^^S-cysteine mixture for 6-24 hours and then evaluated 
CO OP-1 synthesis by conventional im.Tiunoprecipitat ion methods. 



XII. Exemplary In vivo Animal Model for Testina Efficacy of 
Compounds to Modulate OP-1 Expression 



It previously has been demonstrated that OFl can effect 
osteoporosis on the standard ovar iectoraized rat model, as 
indicated by the dose-response increase in alkaline phosphate and 
osteocalcin levels follov;ing injection with OP-1. The 
osteoporotic rat model provides an in vivo model for evaluating 
the efficacy of a candidate modulating compound. in order to 
determine the effect of a candidate morphogen stimulating agent on 
OP-1 production and, thereby, on bone production in vivo, alkaline 
phosphate and osteocalcin levels are measured under conditions 
which promote osteoporosis, e.g., wherein osteoporosis is induced 
by ovary removal in rats and in the presence and absence of a 
candidate modulating compound. A compound competent to enhance or 
induce endogenous OP-1 expression should result in increased 
30 osteocalcin and alkaline phosphate levels. 

Forty Long-Evans rats (Charles River Laboratories. Wilmington) 
weighing about 200g each are ovar iectomi zed (OVX) using standard 
surgical procedures, and ten rats are sham operated. The 
ovariectomization of the rats produces an osteoporotic condition 
within the rats as a result of decreased estrogen production. 
Food and water are provided ad libitum . Eight days after 
ovariectomy, the rats, prepared as described above, are divided 
into three groups: (A) sham-operated rats; (B) ovariectomized 
rats receiving 1 ml of phosphate-buffered saline (PBS) i.v. in the 
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tail vein; and (C) < varieccomized rats receiving various dose 
ranges of the candiate stimulating agent either by intravenous 
injection through the tail vein or direct administration to kidney 
tissue . 

5 The effect of the candidate compound on in vivo bone formation 

can be determined by preparing sections of bone tissue from the 
ovariectomized rats. Each rat is injected with 5 mg of 
tetracycline, which will stain the new bone (visualized as a 
yellow color by fluorescence) , on the 15th and 21st day of the 

10 study, and on day 22 the rats are sacrificed. The body weights, 
uterine weights, serum alkaline phosphate levels, serum calcium 
levels and serum osteocalcin levels then were determined for each 
rat. Bone sections are prepared and the distaance separating * each 
tetracycline straining is measured to determine the amount of new 

15 bone growth. The levels of OP-1 in serum following injection of 
the candidate agent also can be monitered on a periodic basis 
using, for example, the immunoassay described in sections V and 
VII above. 

V. Exemplary Determination of OP-1 Protein Production 
20 Where OP-1 acts as the reporter gene, detection fo the gene 

product readily can be assayed using antibodies specific to Che 
protein and standard immunoassay testings. For example, OP-1 may 
be detected using a polyclonal antibody specific for OP-1 in an 

ELISA, as follows. 

25 Ipg/lOO pi of affinity-purified polyclonal rabbit IgG 

specific for OP-1 is added to each well of a 96-well plate and 
incubated at 37°C for an hour. The wells are washed four times 
with 0.167M sodium borate buffer with 0.15 M NaCl (BSB) , pH 8.2, 
containing 0.1% Tween 20. To minimize non-specific binding, the 

30 wells are blocked by filling completely with 1% bovine serum 

albumin (BSA) in BSB and incubating for 1 hour at 37'*C. The wells 
are then washed four times with BSB containing 0.1% Tween 20. A 
100 pi aliquot of an appropriate dilution of each of the test 
samples of cell culture supernatant is added to each well in 

35 triplicate and incubated at 37^0 for 30 min. After incubation. 
100 pi biotinylated rabbit anti-OP-'l serum (stock solution is 
about 1 mg/ml and diluted 1:400 in BSB containing 1% BSA before 
use) is added to each well and incubated at 37^C for 3 0 min. The 
wells are then washed four times with BSB containing 0.1% Tween 
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20. 100 pi screpcavidin-alkaline (Southern Biotechnology 
Associates, Inc. Birmingham, Alabama, diluted 1:2000 in BSB 
containing 0.1% Tween 20 before use) is added to each well and 
incubated at 37°C for 30 min. The plates are washed four times 
with 0.5M Tris buffered Saline (TBS), pH 7.2. BOpl substrate 
(ELISA Amplification System Kit, Life Technologies, Inc., 
Bethesda, MD) is added to each well incubated at room temperature 
for 15 min. Then, 50 pi amplifier (from the same amplification 
system kit) is added and incubated for another 15 min at room 
temperature. The reaction is stopped by the addition of 50 pi 
0.3 M sulphuric acid. The OD at 490 nm of the solution in each 
well is recorded. To quantitate OP-1 in culture media, a OP-1 
standard curve is performed in parallel with the test samples. 

Exemplary Production of CP-1 Polyclonal and Monoclonal 

Antibody 

Polyclonal antibody for OP-I protein may be prepared as 
follows. Each rabbit is given a primary immunization of 100 
pg/500 pi E. coli produced OP-1 monomer (amino acids 328-431 in 
SEQ ID NO: 5) in 0.1% SDS mixed with 500 pi Co.mplete Freund ' s 
Adjuvant. The antigen is injected subcutaneously at multiple 
sites on the back and flanks of the animal. The rabbit is boosted 
after a month in the same manner using incomplete Freund's 
Adjuvant. Test bleeds are taken from the ear vein seven days 
later. Two additional boosts and test bleeds are performed at 
monthly intervals until antibody against OP-1 is detected in the 
serum using an ELISA assay. Then, the rabbit is boosted monthly 
with 100 pg of antigen and bled (15 ml per bleed) at days seven 
and ten after boosting . 

Monoclonal antibody specific for OP-1 protein may be 
prepared as follows. A mouse is given two injections of £. coli 
produced OP-1 monomer. The first injection contains lOOpg of OP-1 
in complete Freund 's adjuvant and is given subcutaneously. The 
second injection contains 50 pg of op-1 in incomplete adjuvant and 
is given int raper itoneal ly . The mouse then receives a total of 
230 pg of OP-1 (amino acids 307-431 in SEQ ID N0:5) in four 
intraperitoneal injections at various times over an eight month 
period. One week prior to fusion, both mice are boosted 
intraperitoneally with 100 pg of OP-1 (307-431) and 30 pg of the 
N-terminal peptide {Ser2S3-Asn309-Cys) conjugated through the added 
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cysceine to bovine serum albumin with SMCC crosslinking agent. 
This boost was repeated five days (IP), four days (IP), three days 
(IP) and one day (IV) prior to fusion. The mouse spleen cells are 
then fused to myeloma (e.g.. 653) cells at a ratio of 1:1 using 
5 PEG 1500 (Boeringer Mannheim), and the cell fusion is plated and 
screened for OP-l-specif ic antibodies using OP-1 (307-431) as 
antigen. The cell fusion and monoclonal screening then are 
according to standard procedures well described in standard texts 
widely available in the art. 
10 VII. Exemplary Process for Detecting OP-1 in Serum 

Presented below is a sample protocol for identifying OP-1 in 
serum. Following this general methodology OP-1 may be detected in 
body fluids, including serum, and can be used in a protocol for 
evaluating the efficacy of an OP-1 modulating compound in vivo, 

15 A monoclonal antibody raised against mammalian, 

recombinantly produced OP-1 using standard immunology techniques 
well described in the art and described generally in example VI., 
above, was immobilized by passing the antibody over an agarose- 
activated gel (e.g., Affi-Ger^, from Bio-Rad Laboratories, 

20 Richmond, CA, prepared following manufacturer's instructions) and 
used to purify OP-1 from serum. Human serum then was passed over 
the column and eluted with 3M K- thiocyanate . K-thiocyanante 
fractions then were dialyzed in 6M urea. 20m.M PO, , pH 7.0, applied 
to a C8 HPLC column, and eluted with a 20 minute, 25-50% 

25 acetonitrile/0.1% TFA gradient. Mature, recombinantly produced 
OP-1 homodimers elute between 20-22 minutes, and are used as a 
positive control. Fractions then were collected and tested for 
the presence of OP-1 by standard immunoblot using an OP-1 specific 
antibody. Using this method OP-1 readily was detected in human 

30 serum. See also, PCT/US92/07432 for a detailed description of the 
assay . 

IX. Considerations for Formulations and Methods for 
Administering Therapeutic Agents 

Where the OP-1 -modulating agent identified herein comprises 
35 part of a tissue or organ preservation solution, any commercially 
available preservation solution may be used to advantage. For 
example, useful solutions known in the art include Collins 
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solution, Wisconsin solution, Belzer solution, Eurocollins 
solution and lactated Ringer's solution. Generally, an organ 
preservation solution usually possesses one or more of the 
following properties: (a) an osmotic pressure substantially equal 
5 to that of the inside of a manunalian cell, (solutions typically 

are hyperosmolar and have K+ and/ or Mg++ ions present in an amount 
sufficient to produce an osmotic pressure slightly higher than the 
inside of a mammalian cell); (b) the solution typically is capable 
of maintaining substantially normal ATP levels in the cells; and 
10 (c) the solution usually allows optimum maintenance of glucose 
metabolism in the cells. Organ preservation solutions also may 
contain anticoagulants, energy sources such as glucose, fructose 
and other sugars, metabolites, heavy metal chelators, glycerol and 
other materials of high viscosity to enhance survival at low 
15 temperatures, free or/gen radical inhibiting agents and a pH 

indicator. A detailed description of preservation solutions and 
useful components may be found, for example, in US Patent 
No. 5,002,965. 

VJhere the . OP-1 -modulat ing agent is to be provided to an 
20 individual, e.g., the donor prior to harvest, or the recipient 
prior to or concomitant with transplantation, the therapeutic 
agent may be provided by any suitable means, preferably directly 
(e.g., locally, as by injection to the tissue or organ locus) or 
systemically (e.g., parenterally or orally). 

25 Useful solutions for parenteral administration may be 

prepared by any of the methods well known in the pharmaceutical 
art, described, for example, in Remington^ s Pharmaceutical 
Sciences (Gennaro, A., ed . ) , Mack Pub., 1990. Formulations may 
include, for example, polyalkylene glycols such as polyethylene 
glycol, oils of vegetable original, hydrogenated naphthalenes, and 
the like. Formulations for direct administration, in particular, 
may include glycerol and other compositions of high viscosity to 
help maintain the agent at the desired locus. Biocompatible, 
preferably bioresorbable, polymers, including, for example, 
35 hyaluronic acid, collagen, tricalcium phosphate, polybutyrate , 

lactide and glycolide polymers and lact ide/glycolide copolymers, 
may be useful excipients to control the release of the agent in 
vi vo. 



30 
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AS will be appreciated by those skilled in the art, the 
concentration of the compounds described in a therapeutic 
composition will vary depending upon a number of factors, 
including the dosage of the drug to be administered, the chemical 
5 characteristics (e.g., hydrophobicity ) of the compounds employed, 
and the route of administration. Where the morphogen-stimulating 
agent is part of a preservation solution, the dosage likely will 
depend for example, on the size of the tissue or organ to be 
transplanted, the overall health status of the organ or tissue 

10 itself, the length of time between harvest and transplantation 
(e.g., the duration in storage), the frequency with which the 
preservation solution is changed, and the type of storage 
anticipated, e.g., low temperature. In general terms, preferred 
ranges include a concentration range between about 0.1 ng to 100 

15 pa/kg per tissue or organ weight per day. 

Where the therapeutic agent is to be administered to a donor 
or recipient, the preferred dosage of drug to be administered also 
is likely to depend on such variables as the type and extent of 
progression of the disease, the overall health status of the 

20 particular patient, the relative biological efficacy of the 

compound selected, the formulation of the compound excipients, and 
its route of administration. In general terms, a suitable 
compound of this invention may be provided in an aqueous 
physiological buffer solution containing about 0.001% to 10% w/v 

25 compound for parenteral administration. Typical dose ranges are 
from about 10 ng/kg to about 1 g/kg of body weight per day; and 
preferred dose range is from about 0.1 ug/kg to 100 mg/kg of body 
v;eight per day. 

The invention may be embodied in other specific forms without 
30 departing from the spirit or essential characteristics thereof. 
The present embodiments are therefore to be considered in all 
respects as illustrative and not restrictive, the scope of the 
invention being indicated by the appended claims rather than by 
the foregoing description, and all changes which come within the 
35 meaning and range of equivalency of the claims are therefore 
intended to be embraced therein. 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homo sapiens 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

TCAACCGGTC TCTTTAGGTT TTGGCTGTGC TTATTACTAT TCATTCAACA GGTACTAATT 60 

GAGCACCTGC TGTGTGCCAG GCTCAGAATA GGCTCAGGTG AGATGCACAA AGAAGGGTAA 120 

ACTAGAATCC TTGCTTAGAC ACTGACGGAT CAGTTGTTTC ATATGTAAAT TGTAGCACCA 180 

AGACCTGCTG CCCCTGCCCC CAGCCTCACC TGCTTGTGAA GATCCCTCCA AAAGATTTGA 24 0 

GAGTAGATA.A AAAGCAGAGA CTACTACTGA AG.^.^CAGGGC TGCTTTGGCT CCTTATTATT 3 00 

TCAGACTTTG GAAGAAAATG ACCTCCTTTT TCTCTACTGG CACTGAGTGC ATAGCTGACC 3 60 

TAGCAAGCCA GGCCTGGAGG GCGTGTGC AG GGCTGGGGAC CGAGCCTGGT TTCTGTTCCC 420 

TGCTCTGCAG CTCAAGCACT TGCTGTTCCT CC^.CCTGGG?'. TGCCTTTCCC TGGAAAAGCC 4 80 

TGTCTCTTTC TTGTCTTTCA GG ACTCAGGT CAGTGGCATC TCCTCCAAAA ACTCCCCTTC S4 0 
CCACCCTCCA TCACCTCACC CTGTTTATCT GCGCCCCCGC CCCCACTGCC TGTCACTTAT 

TGCAGGCTGA AGTGACCCAG GCTCTCCAGT TGTACACTCT CAGATGGACC CTGGACGACT 660 

GTGGCACTCC TGCAATTTCC CCAGTCTCCC TGGGGTAGGA TTCCTGCTTG CCAGGATGCC 720 

CACCTTTCCT TCTCCCTCCT GCATGTCCTC CTCTGCCTGG CTTCTGAATT GTTTCCAGAG 7 80 

AGAGTGATAG ACAAGATCTG CCTCTCCTTC AGTCCCTGAA TCTTATTTAA GGCTCTTGCT 84 0 

TTGCTTCCCT GGCCTGGAGG CGGCTCCTTG ATGGAGTCTG CCATGTGGGT TCGCTCATGG 900 

CCATGTCTTC CTGCCCAGCA TGGTGCTTGG CCCTGGGACT GGCCACATA.^ TATCTGGGCC 960 

AGGTGCAAAA. TTAGTACGGG GCAGGGGGTA CTTTGTTCAT AGGTGATTCA GAACCACATA 1020 

TGGTGACCTC AGAGTAGGAA ACCAAGTGTG GGGCCCTTAA GAGCTGGGGG GCCCTGTACG 1080 

ACTGTCCAGG TTGCAGGCCC CACAGCTCGC CTCCTGATAT CCTGTGCTCC ATGCTTGTCT 1140 

GTTGAAGGAA GGAGTGAATG GATGAAGAGC AGGTGGTGGG GGGTGGTTTG AGGGCCTTGC 1200 

TGGTGGGTGG GTAGAGGCCC CTCCCTGGCA TGGGGCTCAA GACCTGTTCC ATCCCACAGC 12 60 

CTGGGGCTGT GTGTAAATGG CCAGGACCTG CAGGCTGGCA TTTTTCTGCT CCTTGCCTGG 1320 

CTCTGGCTCC CCTTTCTCCA CCCATGTGGC CCCTCAGGCT GCCATCTAGT CCAAAAGTCC 1380 

CAAGGGAGAC CCAGAGGCCA CTTCGCAAJ^vC TACTTCTGCT CCAGAAAACT GTAGAAGACC 144 0 
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ATAATTCTCT TCCCCAGCTC TCCTGCTCCA GGAAGGACAG CCCCAAAGTG AGGCTTAGCA 
- GAGCCCCTCC CAGACAAGCG CCCCCGCTTC CCCAACCTCA GCCCTTCCCA GTTCATCCCA 
AAGGCCCTCT GGGGACCCAC TCTCTCACCC AGCCCCAGGA GGGAAGGAGA CAGGATGAAC 
TTTTACCCCG CTGCCCTCAC TGCCACTCTG GGTGCAGTAA TTCCCTTGAG ATCCCACACC 
GGCAGAGGGA CCGGTGGGTT CTGAGTGGTC TGGGGACTCC CTGTGACAGC GTGCATGGCT 
CGGTATTGAT TGAGGGATGA ATGGATGAGG AGAGACAGGA GAGGAGGCCG ATGGGGAGGT 
CTCAGGCACA GACCCTTGGA GGGGAAGAGG ATGTGAAGAC CAGCGGCTGG CTCCCCAGGC 
ACTGCCACGA GGAGGGCTGA TGGGAAGCCC TAGTGGTGGG GCTGGGGTGT CTGGTCTCAG 
GCTGAGGGGT GGCTGGAAAG ATACAGGGCC CCGAAGAGGA GGAGGTGGGA AGAACCCCCC 
CAGCTCACAC GCAGTTCACT TATTCACTCA ACAA_ATCGTG ACTGCGCACG TACAGTGGCT 
ACCAGGCGCT GGGTTCAAGG CACTGCGGGT ACCAGAGGTG CGGAGAAGAT CGCTGATCCG 
GGCCCCAGTG CTCTGGGTGT CTAGCGGGGG rAAG?^J^.GGCA ATAAJ.GAAGG CACGGAGTAA 
CTCAA.^CAGC AATTCCAGAC AGC.AA.GAGA.^. ACTACAGG.I.A AGAAAJ^CAAA CGTGCGAGGG 
GCGAGGCGAG GAAACAACCT CAGCTTGGCA GGTCTTGGAG GTCTCTGGGA GGAGAAAGCA 
GCGTCTGATG GGGGCGGGAG GTGGTGAGTG GGGAGAGGTC CAGGCGGAGG G.^_^.TGGCGAG 
CGAGAGACAG GCTGGCAACG GCTTCAGGGA GGCGCGGAGG GGTCAGCGTG GCTGGCTTAA 
A-AGGATACAT GGGACTAGGG GCAAGACCGG CTCAAGGTCA CCGCTTCCAG GACCTTCTAT 
TTCCGCGCCA CCTCCGCGCT CCCCCA^CTT TTCCCACCGC GGTCCGCAGC CCACCCGTCC 
TGCTCGGGCC GCCTTCCTGG TCCGGACCGC GAGTGCCGAG AGGGCAGGGC CGGCTCCGAT 
TCCTCCAGCC GCATCCCCGC GACGTCCCGC CAGGCTCTAG GCACCCCGTG GGCACTCAGT 
AAACATTTGT CGAGCGCTCT AGAGGGAATG AATGAACCCA CTGGGCACAG CTGGGGGGAG 
GGCGGGGCCG AGGGCAGGTG GGAGGCCGCC GGGGCGGGAG GGGCCCCTCG AAGCCCGTCC 
TCCTCCTCCT CCTCCTCCGC CCAGGCCCCA GCGCGTACCA CTCTGGCGCT CCCGAGGCGG 
CCTCTTGTGC GATCCAGGGC GCACAAGGCT GGGAGAGCGC CCCGGGGCCC CTGCTATCCG 
CGCCGGAGGT TGGAAGAGGG TGGGTTGCCG CCGCCCGAGG GGGAGAGCGC CAGAGGAGCG 
GGAAGAAGGA GCGCTCGCCC GCCCGCCTGC CTCCTCGCTG CCTCCCCGGC GTTGGCTCTC 
TGGACTCCTA GGCTTGCTGG CTGCTCCTCC CACCCGCGCC CGCCTCCTCA CTCGCCTTTT 
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CGTTCGCCGG GGCTCCTTTC CAAGCCCTGC GGTGCGCCCG GGCGAGTGCG GGGCGAGGGG 3120 
CCCGGGGCCA GCACCGAGCA GGGGGCGGGG GTCCGGGCAG AGCGCGGCCG GCCGGGGAGG 3180 
GGCCATGTCT GGCGCGGGCG CAGCGGGGCC CGTCTGCAGC AAGTGACCGA GCGGCGCGAC 
GGCCGCCTCC CCCCTCTGCC ACCTCGGGCG GTGCGGGCCC GGAGCCCGGA GCCCGGGTAG 
CGCGTAGAGC CGGCGCGATC CACGTCCGCT CACTGCGAGC TGCGGCGCCG CACAGCTTCG 
TGGCGCTCTG GGCACCCCTC TTCCTCCTCC GCTCCGCCCT GGCCGACTTC AGCCTGGACA 
ACGAGGTGCA CTCGAGCTTC ATCCACCGGC GCCTCCGCAG CCAGGAGCGG CGGGAGATGC 
AGCGCGAGAT CCTCTCCATT TTCGGCTTGC CCCACCGCCC GCGCCCGCAC CTCCAGGGCA 
AGCACAACTC GGCACCCATG TTCATCCTGC ACCTGTACAA CGCCATGGCG GTGGAGGAGG 3 600 

GCGGCGGGCC CGGCGGCCAG GGCTTCTCCT ACCCCTACAA GGCCGTCTTC AGTACCCAGG 
GCCCCCCTCT GGCCAGCCTG CAAGATAGCC ATTTCCTCAC CGACGCCGAC ATGGTCATGA 
GCTTCGTCA.^ CCTCGGTGAG TA.AGGGCAGG CGAGGGTACG CCGTCTCCTT ■ TCGGGGGCAC 
TTTGAGACTG GGAGGGAGGG AGCCGCTTCT TCTATGCAGC CCGCCCAGCT TTCCGCTCCT 3 84 0 

GGCTGAAATC GCAGTGCCTG CCCGAGGGTC TCCCACCCAC AGCCCTATGA CTCCCAAGCT 3 900 

GTGTGCGCCC CCAGGTCGGG CCGCTGGGTC GGTGAGCCTG TAGGGGTTAC TGGGA.AGGAG 3960 
GGATCCTCCG AAGTCCCCTC CATGTTACGC CGCCGGCCGC ATCTCTGGGG CTGGAGGCAA 4020 
GGGCGTTCAA AGCGCGGGGC TCGGTCATGT GAGCTGTCCC GGGCCGGCGC CGGTCCGTGA 4 08 0 

CCTGGATGTA AAGGGCCCTT CCCGGCGAGG CTGCCTTGCC GCCCTTCCTG GGCCCCTCTC 
AGCCCTGCCT GGCTCTGGCA TCGCGGCCGT CGCACCCCCT TACCCTCCCT GTCA.AGCCCT 4200 
ACCTGTCCCC TCGTGGTGCG CCCGCCTTAG GCTACCGGCC GCTCCGAGCC TTGGGGCCCC 4260 
TCTCCGGGCG CCGATGCCCC ATTCTCTCTT GGCTGGAGCT GGGGAAGAAA CGGTGCCATT 4320 
GCTAATTTTC TTTGTTTTCT TTCTTTGTTT ATTTTTTTCT TTTTTCTTTT TTTTTCTTTT 4380 
CTTTTCTTTT CTTTTTTTTT TTTTTTGAGA CGGAGTTTCA CTCTTGCTCG CCCAGACTGG 
AGTGCAATGG CGCGATCTCT GCTCACCGCA ACCTCTGCCT CCCGGGTTCA AGCGATTCTC 
GTGCCTCAGC CTCCCGAGTA GCTGGGATTA CAGGCATGCG CACCATGCCT GGCTAATTTT 
GTATTTTAGT AGAGACAGGG TTTCTCCATG TTAGGCAGGC TGGTCTCGAA. CTCCCGATCT 
CAGGTGATCC TCCCGCCTCA GCCTCCCAJ^ GTGGTGCTGG GATTACAGGC GTGAAGCTGT 4 680 
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GCCCTGCCGC TAGTCTTCTA TTTTAAGTAT TTAGTGGTAG GTCCCGGGCC GGCAGAATCT 4740 

ATTTTCAGCA TTTACCACGT GTGGCGCGCk AACCACAGGT TTTGGCGATT GGGTTGCGCG 4800 

GGATCTCAGA GCTGACGACC GCGGGGGCCT GGGGGTCCCG GTTTCCGACT GGAGCCGCGA 4860 

CGACCCCGGC GACGGChGCC TGGGGCTGCA GCCGAGGGCC GGGGhGCTCC CCCTCCATAT 49 2 0 

GTGCGCGCAC ATTCTCCAGA CTTGCTCAAA CTAACCCCCC GGAGCAGCGC ACGGGCTGCG 4980 

ACTGATGATC AAATATTTGG TTTCCGAGAT AACACACCCC G AT AGCGCTG TTTCCTGAGC 504 0 

CGCTTTCATT CTACTTGTGT AACTTGCTGC GAAAA.CCCGA ACCAAGTCAA GACAGCAAAC 5100 

TCACGCCCAC GGGCCTGTGT C.^^.C ATGGA.^. ATAA.TGATAC TGAAGCCCCA CGCTGGGCAC 5160 

CTGGGGCGTG GACTGGGGGC GCGGGGGAAG CGCAGATCCG CCTTCATGCT TCCCCTCCTC 522 0 

CTGATAAGGT CCCTGGAGTT CCCGGGAGCC ATTGTCTGTA CTTA--.TAATA ACTA.-J^.TCCA 5280 

ACTAGTGAAC CAAGCTTCAG CGkGGCAJiGG GGAGGGAGGT TTAGATGCCA AAATTACCTT 53 40 

CA.\AAAAGTT TAAATTATAC TAJ^GCAGCCA GTTAAGA.AGG AAGCAGCAAT ATATGACCTG 54 00 

ATTTAG.AACC ATCTCCAAGA TGTATGAGGT GG.-„i.AGAAGC AAGGTGCAGA TGAGTGGGCT 54 60 

GCATGTGTGC TTGTATATCA TCGTGTCCTC CTGGAGG.-^G ACACCAGGAA CTGGAGAGAG 552 0 

ATTTTACTGG AGGGGTATAT GOCGGGGGCA TAGCTGGGGC TTACGGAGTG GGAGGTGGGG 5 580 

TCTGATTTTT CGTCGTCTGC ACTTCTGTAT TTGTGATTTT TTTAAAACAA TGTGTATTTA 564 0 

TTAACTATAC CAA_AAAATAA AGGAAAA.TTC CA-^^ATACATA CATATAAATA ATGA.^CCGCA 57 00 

GAGCTCTGTC GCCCTCCTGA AGCCTGGGGT TAGCCAGGGC CCTTTCTCTG GTGGGGG ATT 57 60 

TATAGCATCT TCCCTTCTGT TGGGTACCCC GGACTCCCAC TG.AATGTGCA GGTCCCAGTG 582 0 

GCTGCCTTCA GAGCCTGGCT GG AATCATTA AAAAGGTATT TGTA.2.TCTCT GGCTTCTGCA 5380 

GAAGGCCCTG CAAACCAAGA GCAAAAAAGC CCCCAGTGCT TATGGGCCGG CAGTGTGGGC 594 0 

TAGGCCCGGG GCTCCCTGTC CCCAAGAGAA AGACCAGGTT GCTCGGAGGG TGCCTCTGGG 6000 

AACTTTGGTG CGGGCTATTT GCTCCCCCCA TGGCGGCAGG AGCAAGCTGG GACTTGTTTG 6060 

GGAAGGCCAC AGCTGGGTGG TTTTCCTCCT CTGGCTGTAC ATACACCTTT CAATCCATTT 6120 

CTTTCATCTT GAAAGGACAA AGACCGGCTT GTCTGAGCCT CTTAATCAGT CAGGCTGGCT 6180 

TTGGGCTTTG GGGACCCTGA CTTTCTCAGG TCTAGCTTTC TGGGACATCA CTCCAAATTA 624 0 

GATGGCAGAG TGGCTTTTAA CAGAGCGCAC TGACCTTGTT TTCTTTCTCT CTCTGTCCCT 6 3 00 
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AAACTCGAGG 


TCATTAGTTA 


GGTGAAGACC 


TGGGCTGCAG 


TTTGGCGAGA 


CACTTCCTGT 


6360 


AGATGCTTCT 


AATGTTGGCC 


TTTAATTTCT 


GCTAAGCAGC 


AGCACACAAA 


TAAATGGCCT 


6420 


GTCCCTTCTA 


TCCTGTTGTA 


GCTTGGAATT 


TCTCCATAGG 


AGGGACTTGG 


GGGTGGCAGT 


6480 


AGGGTTGGAG 


AGGGTTGGGG 


GGAGGTGTAG 


GAGACTTGTC 


TGGCCACTGA 


GTTTGCTGAG 


6540 


AAAGTACTGC 


TATAGTGTTT 


TTCCTTGGAT 


TGCAAATCAT 


GTTGATCTGA 


ACTGCTGATT 


6600 


TGAAGTGGAT 


TGAGAGGATG 


GAACAATAGA 


AGGAGGATAT 


GGCTCAGGAC 


AGTCAAGTAC 


6660 


TGGAAGAGGG 


AAAGGTACAA 


AGAGGTGTTG 


GCACTGAATG 


ACCCTGAACA 


GGGCTGCCCT 


6720 


GGAAATATCA 


GAGGTGAGTG 


ACAAAGAGAA 


CTCTAGTCGA 


AGGTCTGGAA- 


GTCAATTATT 


6780 


GTCTCCAGCT 


TTTGTCCCAC 


CCTAAGGGAT 


GGAGCATGAA 


CTTCATGCAT 


GTAACATCCC 


6840 


TCCAGGAGCG 


CTGAGGTTCT 


GGGAATTCCC 


AGTGCTGGCT 


ACCATGCCAT 


TCTTTTCTCA 


6900 


TTCACTCAAG 


AGCGTATTGG 


GATATGCGTG 


CATGAJ^GCA 


ATGTAATTAT 


GGGCACAACC 


6960 


TCAAAACCTG 


CTCTAATTTT 


TTTT TTTT T T 


GGAGATGGAG 


TCTCGCTCCA 


TCACCCAGGC 


7020 


TGGAGTGCA.^ 


TGGCGCGATC 


TCAGCTCACT 


GCAAGCTCAG 


ACCTCCAGGG 


TTCACACCAT 


7080 


TCTCCTGCCT 


CAGCCTCCCG 


AGTAGCTGGG 


AATACAGGCG 


CCCGCACCAT 


GCGCGGCTAA 


7140 


TTTTTTTGTA 


TTTTTAGTAG 


AGACGGGGTT 


TCA.CTGTGTT 


AGCCAGGATG 


GTCTCGATCT 


7200 


CCTGACCTCG 


TGATC CACCC 


GCCTCGGCCT 


CCCAAAGTTC 


TGGGATTACA 


GGCGTGACAG 


7260 


CCGTGCCCGG 


AATCTGCTCT 


AATTTTTTA-A 


AGATATCATT 


TGCAAACTTT 


GGGCACTTGA 


7320 


GTCACTCAGT 


AAGATATTAT 


TTACAACCCC 


ACCATAGATT 


CAAACCTCTG 


TCCTAGAATG 


7380 


TTGTCGAGTT 


AGGCATCTGG 


CTTGCAGCAA 


CAGCTCGCTT 


TCCTGTCTAT 


GCTGTCTCCT 


7440 


TCCAGGGAGG 


ATGTTTCACC 


CTTCATATTG 


AGGA.^ATGGG 


CACAGAGAA.C 


CCATTTCTCT 


7500 


TACTCATCAT 


GTAACTTCAG 


TGGGATGGTC 


AGATCTATCT 


TTAACCTGGC 


CACTCTTCCA 


7560 


CAAGCTCACA 


CTGACTCCAG 


CAAGATCTTA 


A-ACTAGAAGG 


CAGGAGTTCA 


AATCCTAGCT 


7620 


GGTGCAGTGG 


CCAAATCTCG 


GCTCACAGCA 


CCTTCTGCCT 


CCTGGGCTCA 


AGCGATCCTC 


7680 


TGACCTCAGT 


CTCCCAAGTA 


GCTGGGACCA 


TAGGCATGCA 


CCACTATGCC 


TGGCTAATTT 


7740 


TTGTATTTTT 


GTAATTTTTT 


GTAGAGACAG 


AGTTTCACCA 


TGTTGCCCAG 


CCCAGTCTTG 


7800 


AACTCCTGGA 


CTCAAGCAAT 


CTTCCCACCT 


TTGCCTACCA 


GAGTGCCGGG 


ATTACAGGTG 


7860 


TGAGCCATCA 


TGCTAGTTGC 


GCACAGTTGG 


GCGAAACTGA 


CAGATGAGAA 


AGCAGAACCT 


7920 
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CGTGAGTCCA CTCAGTAAGA GACTCCCTAC TTTCTTTCTG AGTCTTTGTT TCTCATCAAT 
TGAATGGCAA TAAACAACTT GGTGGCCC^^-A. GAGTTGATGA CAACAGTCCT ATAAGATTAT 
ACATGTAAAA GAAACAGAGT ATTCTACAAA TATCAGTTAT TGATAGTTCA ATAGGCAACC 
TGACATTACC TTTTCTTGGA ACTTGATGAA CAACTCAGAA ACTCATTAAT ATCAAACCCA 
ATGGTGAGCA CTTGGTCTTT ATTTATGGCT GTAAGAGAAG AAATTGAATT AACTCTATGT 
AAATGCCAAC TAAGAACATC GAAGTCTGAA ATCAACAGTT TTCCTCGCTC ATACGACACA 
CCCAAACTCA AGCAGTGGTT CCAAGCCCCT TTGGAAAATA CCATGGGCTA ACGACTTTAA 
AAGCTTAG.A.:^ GTGAATTCTA CTTACTTATT ACTTA;^J^GT GGTTCTCAAA CTTCAAGGTG 
AATCAAAATC ATCTGTAGAG CTTGTT.AA.AA CACAGGTTGC TGGTCCACCC C.AAGAGTGTC 
TTGAGTCAGT AGGTCTCA.^.G TAGGGCTCA.S GAATATGCAT TTCTPJ-.TGAG CTCCAGGTGA 
GTCTAAGTGT TAGTCGTCGG TCTTGGGACC ACAACTTTGG GAACAATTGA TTTAG.AAGAA 
CTCI^J^J^.GATC AGAAAGGGGT GGAATATTTT TAA.A.i.TTGTG GTAAA-^TACG CAT.AAACAGA 
AA.^GGTACAA TTTTAACCAC TTAGAGAGAG GTGGGATCTA AGA.^CAGAA.^ TTGTTATGCC 
ATCAAAGGTG AGTTCAGATA AGCATTATTA A^^TCGTATCT ATGGAT;^-:^.:^C TTCAGGGGCC 
CTGTGGAGCA ACCCAATGCT GGGATGGGGT CCAGGTGTGC TATGGTTTGG ATGTGGTTTG 
TCCCTACAA^. AACTCATGTT GA.^A.TTTA;^.T TGCCAGTGTA ACATTATTGA GAGGTTATGG 
ACTTTTAA.GA GGCATTTGGG TCATGAGGGA TCCACCTTCA GGGATTAGTG cagtctccag 

ggagtgagtg agttcccatt ctagtgggac tcgattagtt accatacagt ggttgttata 
^-^gtgaggct gcttctggtg ttttatctgt ttgcaggcac ttccttcccc ttccacttct 
ctgccaggtt aggatgcagc atgaggccct caccagaagc tgaccagatg tggctgcctg 
atcttgaact tcccagtccc cagaaccatg agctaaataa accttttttc tctataaatt 

ACGCAGTCTA GAGTATTCTA TTATAGCAAC ACAAGACAGA CTAAGACACA GTGGTAGAAA 
GAACACTACT GACTTCTCCC ATACTCTGGC CTATGGACAA GAGTGACAGA CAGACAAGAG 
TGAATATCAG GGCCCTCAGG CACATTCCTC TCTGCCCCTT CCTCCCTTCT TGCAGAGTCT 
CCAGTGACTG CCAGCTAATG CTATCATAGA CCCCACCTTT CCCCTGACTT GATTGGACCA 
GAAGCAGCCT CCTGATCCAT GGCCAAC.^T CAGATTCACT TTCAAGAATT TGAACTAA.GA 
GACACTAGGA AGATGGCCCT TGAGCTGTGA GTCCTACACT TGAAAGTTCT TAGCATCTTG 
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GTCAGGTACC CACCAGGGCC ATGTGCAAAC TGAGATAATG GGGACATGGA ACAAGGGTAA 9600 

GTGGAGAGGG CTGGCTGGAG AGAGACGGGC AGAGGAAAGC CCTGCCAAGA GGAGCAGAGA 9660 

TGAGAGACCT TGGAGGGAGA GGTAATAAAA GGAGGCAAAG ATGATTTTCC ATGCTTACAA 9720 

CTCACAGCTG AGGCCTAACT ATCTTTATGT CC ATAAGAGG CATCCTTGTG TCGAACCTCT 9780 

CCTCTTTCTT GGGTCAATGG GGGATGGTTG CAAGGGACCA TCAGTAGGAA GGCATAGTAC 9840 

ACTAACCCAG TCTGGGGTGG GCTTTTAGAC TAGTCTTCCT CCCATGCTCC TCCTCCCATT 9900 

GGAACCCCGG ACTTTCAAGA CTGCTACCTA GCACACCAGT GCACCAGATG TCACTCAAAA 9960 

CCTCTTCAGC AATGGCCCAC TCACCTTCAA AAAGGCTGAA GAGCAGACTG GCTGGGTTCT 1002 0 

TCATGGTGGA GGGGC AGTCT GGGAGGTTTT AAGGTTGAAG ATGAAAACTT TCACTTTTGG 10080 

CTCAATGGTC TGAAAAAGAG AAGGACCAGC A.^GTGAACTG AAGCCTCCTG GAAJVGCATCT 10140 

TGATAACAGG GGCAGAGTTT CAAGATGAGA AGCTGTGGCA CTTACTCTGG CTTTGGAAAT 10200 

GACCTCTAAG TATCTCAGTT AATTAAAGGA GTCAAACTCT AGACTCGAAG GAGAAGATCT 10260 

AC.AATTTTCA ATAACATAGT CTACCCTCCC CTCCTTCCCC CACCTTCACC TCTTCTTTCA 10320 

TCACAGGCTT ACAGGGCACC TCTTAGAGCC AGGCACGGTG TTGGGATCAG GAACAAGGCC 10380 

ACTGCTCACA TCCAGAGCCT GTGCTACTTA AG.^GCTTCC AGGACCTCTT GGATGGCTGT 10440 

GGTTAGTGCC CTACTTTTCC CAGCAGGTTG GATGCAGAAT CATGCTCTTG TCGTTCAGGA 10500 

TGACCATGGG GACCATGGGT CTGAGCCTGT GACCCTCCAG TCTACAGTGT GTTGGTGAGG 10560 

AAGGAGCAGT TGTCACTGGG GTCACTGGCA ATGGGCATGC CTCCATCTAG CTTAGGCAAG 10620 

ATGCTTAGAC TCAGAGCCAG AGAGTGAAA.C CCAGACACTA ATGAGCTGTC GGTGTTGGTG 10680 

TGTGTTCTCT TCCTCTTCCA GTGGAACATG ACAAGGAATT CTTCCACCCA CGCTACCACC 10740 

ATCGAGAGTT CCGGTTTGAT CTTTCCAAGA TCCCAGAAGG GGAAGCTGTC ACGGCAGCCG 10800 

A.i^.TTCCGGAT CTACAAGGAC TACATCCGGG AACGCTTCGA CAATGAGACG TTCCGGATCA 10860 

GCGTTTATCA GGTGCTCCAG GAGCACTTGG GCAGGTGGGT GCTATACGGG TATCTGGGAG 10920 

AGGTGCTGAG TTTCCTCTGG GGGCAGAGGA AGAAGGTGGT GAGGGTTTCC CTCCCCTCCC 10980 

ACCCCATGAG CTCTGCTTCC CATCTGTTGG GGTAGTGGAG CTGTGACCTG CTAACGCGAA 11040 

GCCCGTGTCT CTCCTCCTCT CTCGCAGGGA ATCGGATCTC TTCCTGCTCG ACAGCCGTAC 11100 

CTCTGGGCCT CGGAGGAGGG CTGGCTGGTG TTTGACATCA CAGCCACCAG CAACCACTGG 11150 
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GTGGTCAATC CGCGGCACAA CCTGGGCCTG CAGCTCTCGG TGGAGACGCT GGATGGTGAG 
TCCCCCGCCA CTGCCAGTCC TAATGCAGCC TGTGCTCCTG GACTTCAGGA GGGTCTCAGC 
AGTGCTCATG CTTGCTTCAC TACAAACAGG CTTCCCCGCC CCTCCCAACC AGTACTCCAT 
GTTCAGCCTT TTGATCCTGC AGCCCTGTCC CGCTCGTGGC CCTCCTGTAA CTGCTCTTCT 
GTGCACTTGG CTGCTTCCTG TCCAGGGCAG ACGATCAACC CCAAGTTGGC GGGCCTGATT 
GGGCGGCACG GGCCCCAGAA CAAGCAGCCC TTCATGGTGG CTTTCTTCAA GGCCACGGAG 
GTCCACTTCC GCAGCATCCG GTCCACGGGG AGCAAACAGC GCAGCCAGAA CCGCTCCAAG 
ACGCCCAAGA ACCAGGAAGC CCTCGGATGG CCAACGTGGC AGGGTATCTT AGGTGGGAGG 
GATCACAGAC CCACCACAGG AACCCAGCAG GCCCCGGCGA CCGCAGGAGA CTGACTAAAA 
TCATTCAGTG CTCACCAAGA TGCTCTGAGC TCTCTTCGAT TTTAGCAAAC CAGGAGTCCG 
A-AGATCTAA.G GAGAGCTGGG GGTTTGACTC CGAGAGCTCG AGCAGTCCCC .A-AGACCTGGT 
CTTGACTCAC GAGTTAGACT CCACTCAGAG GCTGACTGTC TCCAGGGTCT ACACCTCTA.A 
GGGCGACACT GGGCTCAAGC AGACTGCCGT TTTCTATATG GGATGAGCCT TCACAGGGCA 
GCCAGTTGGG ATGGGTTGAG GTTTGGCTGT AGACATCAGA AACCCA.AGTC AAATGCGCTT 
CA^^CCAGTAG AAAATTCACC AGCCCGCAGA GCT?-J^.GGTTG GGTGGACATT AGGGTTGGTT 
GATCCAGGAG CTCAACAGTG TGCTCTGAGC CCCAGCTCCT TCTGCCCCAC CCCACCATCT 
TCAGTGCTGC TTCCTCTCAA GGCCACAGCT GTAGTTGGCC AGGGGGGCTT CATTATTTTT 
TGCTCCTGGG CAGTAGGAGG AAGAGAATGA ATGTCTCTCC ATGGGTCTTT CTTAGGAATG 
TGGGA.ACTTT TTCCAGAAGT CTCTATGTCT TTTAGTTTGT GTTGGGTCAC TTGCCCTTCC 
TGAACCACTT CCTGACTCCT GGACAGGATG TGCACTGATG AGCTTAGCTT TGGGGATCTA 
ATAGTGACTT TACAAAGCCT CTTTGAGA^^.G GTGACATTGG AACCAAGGCT TGAGCAGACA 
CAACAAAGAT TGCAGGGAGG GGCATTGCAG GTGGAGGAAA CGGCACATGC AAGAGCCCTG 
CGTGGGAGTG AGCTTGGTGT TTGGTCAATC AGTTGTCAGA GCACACCGGG CCCTGTCAGC 
AGGCACAGCC TGGGCCTGCT CTGAGTATGA CAGAGAGCCC CTGGGAAGTT GTAGGTGGAG 
GAAJ^GACAGG TCATGACTAG GAAAAAAGCA ATCCCTCTGT TGTGGGGTGG AAGGAAGGTT 
GCAGTGTGTG TGAGAGAGAG ACAAGACAGA CAGACAGACA CTTCTCAATG TTTACAAGTG 
CTCAGGCCCT GACCCGAATG CTTCCAAA.TT TACGTAGTTC TGGAAAACCC CCTGTATCAT 
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TTTCACTACT 


CAAAGAAACC 


TCGGGAGTGT 


TTTCTTCTGA 


AAGGTCATCA 


GGTTTTGACT 


12840 


CTCTGCTGTC 


TCATTTCTTC 


TTGCTGGTGG 


TGGTGATGGT 


TGCTTGTCCC 


AGGCCCTGTC 


12900 


CCGCATCCTC 


TTGCCCCTGC 


AGAGGGATGA 


GTGTGTTGGG 


GCCTCACGAG 


TTGAGGTTGT 


12960 


TCATAAGCAG 


ATCTCTTTGA 


GCAGGGCGCC 


TGCAGTGGCC 


TTGTGTGAGG 


CTGGAGGGGT 


13020 


TTCGATTCCC 


TTATGGAATC 


CAGGCAGATG 


TAGCATTTAA 


ACAACACACG 


TGTATAAAAG 


13080 


AAACCAGTGT 


CCGCAGAAGG 


TTCCAGAAAG 


TATTATGGGA 


TAAGACTACA 


TGAGAGAGGA 


13140 


ATGGGGCATT 


GGCACCTCCC 


TTAGTAGGGC 


CTTTGCTGGG 


GGTAGAAATG 


AGTTTTAAGG 


13200 


CAGGTTAGAC 


CCTCGAACTG 


GCTTTTG.AAT 


CGGGA.\ATTT 


ACCCCCCAGC 


CGTTCTGTGC 


13260 


TTCATTGCTG 


TTCACATCAC 


TGCCTAAGAT 


GGAGGAZ^CTT 


TGATGTGTGT 


GTGTTTCTTT 


13320 


CTCCTCACTG 


GGCTCTGCTT 


CTTCACTTCC 




AGAGAACAGC 


AGCAGGCACC 


13380 


AGAGGCAGGC 


CTTGTAAGAA 


GCACGAGCTG 


TATGTCAGCT 


TCCGAGACCT 


GGGCTGGCAG 


13440 


GTAAGGGGCT 


GGCTGGGTCT 


GTCTTGGGTG 


TGGGCCCTCT 


GGCGTGGGCT 


CCCACAGGCA 


13500 


GCGGGTGCTG 


TGCTCAGTCT 


TGTTTCTCAT 


CTCTGCCAGT 


TAAGACTCCA 


GTATCAAGTG 


13560 


GCCTCGCTAG 


GGAAGGGGAC 


TTGGGCTAAG 


GATACAGGGA 


GGCCTCATGA 


AATCCGAGAG 


13620 


CAGAAATGTG 


GTTGAGACTT 


GAACTCGA-^C 


CAGGAACCCA 


AACACTTTGG 


ACTCTGA.^CC 


1368C 


CCATTCTCTG 


CATGCACCTC 


ATTCCCATCC 


CTTGGCTGGC 


TGCTTCTCAA 


GATGATGCCG 


13740 


GGCCGTGTGT 


TTGAATGTAG 


ATACCTGGGG 


AGCCATCTCC 


CCCTCTGCCC 


TCTGACTTCA 


13800 


TTTACCCCAT 


TCCCATTCCC 


ACGGGAGGGA 


CGGATCTCCC 


CAGCTTGGTT 


CAGGCGCTTG 


13860 


TTCCTGAACC 


AGTCAACTGT 


TTCAGGGGTG 


GGGTCATGTT 


ACTGGCACAT 


GGCTGCCCCC 


13920 


TCTGGAGCCA 


TTTGCATGGA 


GTGAGGCAAA 


AGGCAGGGGA 


TGAATCTAGG 


AGAGGAGTGA 


13980 


GGGTCATGTG 


ATCCACCTGC 


CGTGAGCTCT 


GGATCGTGAT 


TCTCATTCAG 


CAGTCACGAG 


14040 


CATCTCGAGC 


GTTCTGGGCC 


CTGTTCTAGG 


TACTGGATTG 


GAGATGCAGC 


GATGAACACT 


14100 


GCAATGTGTC 


TGCCCTGTGG 


GGCTCAAATA 


TCCCTGGAGA 


GGGTATTGTC 


ATGAGGTCAT 


14160 


CAGGGCAACT 


GGTGGTATTC 


TACCCTCAGG 


GAGCTTGTAG 


TTCAGTGGGA 


GAGTCCAGAA 


14220 


TCTTCCCTGG 


GGATTATGCC 


CAGACACACT 


CAGGGCGTAC 


GTGCACACAG 


CCAGCTCTGA 


14280 


GCCCTCCTGT 


GAGCCTGCCC 


TCAGGACTGA 


TGACCACATC 


TACCTGCAGC 


TGGGACAG.A?^ 


14340 


CCCAAACTCC 


AGGGGCCTCT 


GCTGGAAGAT 


TCCATGTGCT 


TAAGCATCAC 


TGAGGAGTAT 


14400 
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ATTGATTATT GGGC AACATT TCTGTGCCAC CC AGACCCTA GAGGCAAGGA TGGCACATGG 14 460 
ATCCCTTACT GACCAGTGCA CCCGGAGCCA GCATGGGTGA TGCCATTATG AGTTATTAGC 14520 
CTCTCTGGCA GGTGGGCAAA CCGAGGCATG GAGGTTTGTT TAAGGTGAAC TGCCAGTGTG 14580 
TGACCACCTA GTGGGGGTAG AGCTGATGAT TGCCTCACAC CGGAGGCTCC TTCCTGTGCC 
GCGTTCTGTC CAGAAGACAC AGCCATGGAT GTCCATTTTA GGATCAGCCA AGCCCGTGGG 
GCTTTCCTTC ATTTTTATTT TATGTTTTTT TAGAAATGGG GTCTTGCTCT GTCACCCAGG 
CTGGGGTGCA GTGGTGTGAT CATACGTCAC CGCAGCTTTG AGCCGTCTTC CCACTCAGTC 
TACTAAGCTT GGACTATAGG_ CCAAGACTAT AGAGTGGTCC TTCTTTCCAT TCTTTTGGGA 
CCATGAGAGG CCACCCATGT TTCCTGCCCC TGCTGGGCCC TGCTGCTCAG AAGGCATGGT 
CTGAGGCTTT CACCTTGGTC GTGAGCCTTC GTGGTGGTTT CTTTCAGCAT GGGGTTGGGA 
TGCTGTGCTC AGGCTTCTGC ATGGTTTCCC ACACTCTCTT CTCCTCCTCA GGACTGGATC 
ATCGCGCCTG AAGGCTACGC GCGCTACTAC TGTGAGGGGG AGTGTGCCTT CCCTCTGA.AC 
TCCTACATGA ACGCCACCAA CCACGCCATC GTGCAGACGC TGGTGGGTGT CACGCCATCT 
TGGGGTGTGG TCACCTGGGC CGGGCAGGCT GCGGGGCCAC CAGATCCTGC TGCCTCCAAG 
CTGGGGCCTG AGTAGATGTC AGCCCATTGC CATGTCATGA CTTTTGGGGG CCCCTTGCGC 
CGTTAAAA^.^ AAATCA^^AAA TTGTACTTTA TGACTGGTTT GGTATA-A.AGA GGAGTATA.AT 
CTTCGACCCT GGAGTTCATT TATTTCTCCT A.ATTTTTAA.:^ GT.AACTAAA.^ GTTGTATGGG 
CTCCTTTGAG GATGCTTGTA GTATTGTGGG TGCTGGTTAC GGTGCCTAAG AGCACTGGGC 
CCCTGCTTCA TTTTCCAGTA GAGGAA^CAG GTAAACAGAT GAG.A.2^_ATTTC AGTGAGGGGC 
ACAGTGATCA GAAGCGGGCC AGCAGGATAA TGGGATGGAG AGATGAGTGG GGACCCATGG 
GCCATTTCAA GTTAAATTTC AGTCGGGTCA CCAGGAAGAT TCCATGTGAT AATGAGATTA 
ACGTGCCCAG TCACGGCGAC ACTCAGTAGG TGTTATTCCT GCTCTGCCAA CAGCAACCAT 
AGTTGATAAG AGCTGTTAGG GATTTTGTCC TTTTGCTTAG A.ATCCAAGGT TCAAGGACCT 
TGGTTATGTA GCTCCCTGTC ATGAACATCA TCTGAGCCTT TCCTGCCTAC TGATCATCCA 
CCCTGCCTTG AA.TGCTTCTA GTGACAGAGA GCTCACTACC AGGACTACTC CCTCCTTTCA 
TTTAGTAATC TGCCTCCTTC TTTTCTTGTC CCTGTCCTGT GTGTTAAGTC CTGGAGAAAA 
ATCTCATCTA TCCCTTTCAT TTGATTCTGC TCTTTGAGGG CAGGGGTTTT TGTTTCTTTG 
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TTTGTTTTTT TAAGTGTTGG TTTTCCAAAG CCCTTGCTCC CCTCCTCAAT TGAAACTTCA 16080 

AAGCCCTCAT TGGGATTGAA GGTCCTTAGG CTGGAAACAG AAGAGTCCTC CCCAACCTGT 1614 0 

TCCCTGGCCT GGATGTGCTG TGCTGTGCCA GTATCCCCTG GAAGGTGCCA GGCATGTCTC 16200 

CCCGGCTGCC AGGGGACACA TCTCTATCCT TCTCCAACCC CTGCCTTCAT GGCCCATGGA 16260 

ACAGGAGTGC CATCGCCCTG TGTGCACCTA CTTCCATCAG TATTTCACCA GAGATCTGCA 16320 

GG ATCAAAGT GAATTCTCCA GGGATTGTGA AATGATGCGA TTGTGGTCAT GTTTAAAAGG 16380 

♦ 

GGGCAACTGT CTTCTAGAGA GTCCTGATGA AATGCTTCCA GAGGAAATGA GCTGATGGCT ' 16440 

GGAATTTGCT TTAAAATCAT TCAAGGTGGA GCAGGTGGGG AAGGGTATGG ATGTGTAAGA 16500 

GTTTGAAATT GTCCATCATA AAATGTGTAA A^^GCATGCT GGCCTATGTC AGCAGTCACA 16560 

GCCTGGAGGT GGTAACAGAG TGCCAGTCAC TGATGCTCAA GCCTGGCACC TACAGTTGCT 16620 

GGAAACCCAG AAGTTTCACG TTGAAAACAJ^ CAGGACAGTG GAA.TCTCTGG CCCTGTCTTG 16680 

A.-.CACGTGGC AGATCTGCTA ACACTGATCT TGGTTGGCTG CCGTCAGCTT AGGTTGAGTG 16740 

GCGGTCTTCC CTTAGTTTGC TTAGTCCCCG CTATTCCCTA TTGTCTTACC TCGGTCTATT 16S0 0 

TTGCTTATCA GTGGACCTCA CGAGGCACTC ATAGGCATTT GAGTCTATGT GTCCCTGTCC 16860 

CACATCCTCT GTAAGGTGCA GAGAAGTCCA TGAGCAAGAT GGAGCACTTC TAGTGGGTCC 16920 

A.AGTCAGGGA CACTATTCAG CAATCTACAG TGCACAGGGC AGTTCCCCAA CAGAGAATTA 16980 

CCTGGTCCTG AATGTCGGAT CTGGCCCCTT CCTTCCCCAC TGTATA^.TGT GA.A.AACCTCT 17 04 0 

ATGCTTTGTT CCCCTTGTCT GCAAAACAGG GATAATCCCA GAACTGAGTT GTCCATGTAJ^. 17 070 

AGTGCTTAGA ACAGGGAGTG CTTGGCTTGG GGAGTGTCAC CTGCAGTCAT TCATTATGCC 17160 

CAGACAGGAT GTTTCTTTAT AGAAACGTGG AGGCCAGTTA GAACGACTCA CCGCTTCTCA 172 20 

CCACTGCCCA TGTTTTGGTG TGTGTTTCAG GTCCACTTCA TCAACCCGGA AACGGTGCCC 17 280 

AAGCCCTGCT GTGCGCCCAC GCAGCTCAAT GCCATCTCCG TCCTCTACTT CGATGACAGC 17340 

TCCAACGTCA TCCTGAAGAA ATACAGAAAC ATGGTGGTCC GGGCCTGTGG CTGCCACTAG 17400 

CTCCTCCGAG AATTC 17415 
(2) INFORKATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CRARACTERISTICS : 

(A) LENGTH: 2298 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: misc^f eature 

(B) LOCATION: 1..2298 

(D) OTHER INFORMATION: /note= "MOPl UPSTREAM SEQUENCE" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TGCATAGGTC ACACATCCCT CCTCTACCCA AGGCTAGCCA GGTGCCCTAT CTCTCCCTTC 
TTCGGTGCCT CCTCCGACTG GGCTCTGACG TTCTCAGAGA GAACGAAAGG GAA.2^GACTGC 
TTGCTACCCT TCTGTTCCGG ACCTTACTGA AGGGCCTTAG TGTTTCCAGG GGCCCA.:iGA.:^ 
CCAGGTAGCC GTGGAGGTTG CCATGCTTGC CTGCCCACTC ACCCAACGTT CTCCTGCCTG 
GCCTGTGTGT GGCACCCATG CAGGCC AC AG AAGGCCACAC ACAGCCTTCA GGATGAGGCA 
GGGCCCCTTT GTTTATTCAA TkTCPJiGkAC TGTAACGTGG TCACCGGAGG TCATGTCTCC 
AGCTCGCAGC CTGCTTGGCC TCAGATCACC CACCACAGCA GGTCCAGGG^ GGGGCCTCTC 
AGGTCTGCAC TGGGCCAGGG ACTCAGTACT GGTGGGCATC CAAGGCCTGG GCTAAGACCT 
GCAJIGTTTCT TTTAGCCCCT CAGACAGTCA CATCACCTA.i .^^TTCCTACC AAGGAGCCCT 
GAGAGACCTA GGTAGTTATC TCTGTTCCAG GAAGCCTGA.^ AGACCAGGCT TCCCATCTCA 
CCCTAGGACT TCAAGAGGGA CCCCCTACTC A-^^GGCCCTTC CCCAGCCCCT ACTTGCCATT 
TTAccAcccc tg;^j^cgctt GCTTGTCGCC CACCTTCAGC AAAGCAGGA.^ GCCTGGCTCA 
CC ATCCCCAC TCACTCACTG CCATTCTGGG TGA.2^GGCTGC TTTGCTCCCA TTTTTCAGAT 
TAGG.AAACGG AGGCTCCAAJ^ GAGCAGCAA.T CCACTGAGAG ACCCAGTATC TGTCTGGGAC 
GTTTCCTCCT GGGAGGAGAG GGAGGCTAGT CCTTTGAGAC AGGAAAATCG AGTCGGGAGC 
TCTTCTGAAC TTGGGTACCA ACTGCCTACT CCTCAGGCCC CTGACCTGGG GCTAGGGGTA 
GGGGTTATTA GACAGTGAGG TACCAAAGGA CTCATGTCAG GACCCCGCCC CCCCAAGAGA 
GGAGGGGGTG GGAACATTCT CTAGTCCCAG ATTTCACTTA TGTACTCTGT AGAGCTGCAG 
CATCTGGGGT TTGAAGGCTT TGGGTTAAAA GATACTTGGG AAGGAAAAGC CGAGAAGTAC 
CTGGGCCCGG ATCCCTTGGG TGCTGGACTT GAGGGGAGGT GTGTGTGTGT GTGTGTGTGA 
GTGTGTGTGT ATGTATGTGT GTGTTGGGGG AGTGAAGTGT AGAAAGAACT TTATCTCCAC 
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ATTATCTCTG CCCGTCCTGG AAGGTTCCCA GAGGAAGTGG CACCCGAGGG GGGAGGGGCA 1320 

GGGAG AACGT TCCCCGAGGA ACAAAAGCCA GGATAGCAGA GGGGCAAGCG GTGGGGGTAC 13 80 

CGAGGGGGTT TTGCATGACT GGAGCAAATG GAGTGTTGGG GGGGGCGGTT CGAAAGATGA 144 0 

GCCAGGTCCA AGAGTGGCCA CCTCCGAGGA GCCTTCTCGG ATTCCTGCGC TCCCTCCTGG 1500 

ATGCTTTCCT AGCACAGCCC TTAGTTGCTA CACTTTGGCC ACTTCCAAGT GCGAGTCCCG 1560 

AGAGAGCTGG GCAGATTGGG ATTCTTCTCT CTGGGTCCCT GCGGCGTCTG TCCCAGTGCC 1620 

GGACACCCGG TGGGCACTCG GTAAATATTT GTAGAGCGCC CTGGGAGGAA TGAATGAAGC 1680 

CATTGGGCCA GGCTTGGGGA GGGCGGGGAC AGGCGCAGGT GGGAGGCAGC GGGAGCGGGA 17 4 0 

GGGGCGGGGA AGTCAGTCCT CCCGCTCCTC CCCCGCTCCC CGGCCCCAGC GCGCCC A-^CT 1800 

CCGGGGCTCC CGAGGCGGCG GGCGGGCGAT CCGGGCGCGC AGGGCCCTTG TATTGGGCAC 186 0 

GCGGGAGATC GGAAAGGGGT TTGTTGCTGG TGCCCGCGGG CCTGAGCGCG ATCAGAGCGG 19 2 0 

GAGGAGGGAG CTAGGGTTCG CTCAGCGCCC AGCTGCCTCT CCGGCACTCG CTCTCCGGAC 1980 

TGTAGGTCTG CAAGCTGCTG CTCCTCCCAC CCCGGCCCGC CTCCTCGCTC TCTTGCTCGC 2 04 0 

TCTCTGGAGT TGCTGTGCTA GCCTTGCCGT GCGTCCTGGC GAGTGCGGGC CGAGGGGCCC 2100 

CGGGCCAGAA CTGAGTAAAG GACAGGGGCG TCCCGGGCAA AGCGCAGCCG GCCGGGGAGT 2160 

GGCCATGTGT GGCGAGGCCG CCTTGAAGCT CGCCTGCAGC A-AGTGACCTC GGGTCGTGGA 2220 

CCGCTGCCCT GCCCCCTCCG CTGCCACCTG GGGCGGCGCG GGCCCGGTGC CCCGGATCGC 2280 

GCGTAGAGCC GGCGCGhTG 2 299 
(2) INFORl'IATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2997 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: misc_f eacure 

(B) LOCATION: 1..2997 

(D) OTHER INFORMATION: /note= "MOPl TERMINAL SEQUENCE- 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
TAGCTCTTCC TGAGACCCTG ACCTTTGCGG GGCCACACCT TTCCAAATCT TCGATGTCTC 
ACCATCTAAG TCTCTCACTG CCCACCTTGG CGAGGAGCCA ACAGACCAAC CTCTCCTGAG 
CCTTCCCCTC ACCTCCCCAA CCGGAAGCAT GTAAGGGTTC CAGAAACCTG AGCGTGCAGG 
CAGCTGATGA GCGCCCTTTC CTTCTGGCAC GTGACGGACA AGATCCTACC AGCTACCACA 
GCAAACGCCT AAGAGCAGGA AAAATGTCTG CCAGGAAAGT GTCCATTGGC CACATGGCCC 
CTGGCGCTCT GAGTCTTTGA GGAGTAATCG CAAGCCTCGT TCAGCTGCAG CAGAAGGAAG 
GGCTTAGCCA GGGTGGGCGC TGGCGTCTGT GTTGAAGGA.^ AACCAAGCAG AAGCCACTGT 
AATGATATGT CACAATAAAA CCCATGAATG AAAATGGTTA GGATACAGAT ATATTTTCCT 
AAACAATTTA TCCCCGTTTC TTGGTTTATT CTGACTTTGT AAACAGAAAA GCCGGGGCTG 
TGGAGGATGG AGAGGCCCCT CCTTTCCGTC TCGTCTCGTT GTGTGTGTTT ACCAGACCTG 
CCCAAATCCA GCCTGTAGGG AGGAGGAGGA GGA7GTCTGC TCAGA;^.GAGG CCAGTGAGGG 
ATGTGGCCTC AAAGGGTGTT GGGATG.^GA TGGAGGGAGG TATGCATGCA CACACACACA 
C AC AC AC AC A CACACACACA CATGCATGAT ACACACACAC AC AC AC AC AC ACACACACGA 
TGCACACACA CACACACACA CACACACACA CACACACGCA CGCACGCACG CACACGCACG 
CATGCATGCA CACACACACG CACACACACA TCTGAAGCGC ATGTAGACTT TGG.^ATGGCT 
CTGCCAGTCC CTCAGCCCCA ATTCCTGCCC CATGGTAGGA AP.TCCATGAG ?^S^\GC?^AAG 
CTAACAAGCA CAGCGGACCC TACCTGAGGA AGCACAGGGG ATGCAGGCTC TTCAGGACAC 
TGTCCTCCAA ACAAGGCCCC TCTGGCACCT CTGTGGCCGA GCTCCGGAGC CAGGTCCTGG 
CCTTCACAGC TGCCTCTCTT CACTCTCAAC CCTAACAGAA GGTTCTGCGA CAGATTGGTT 
TCTGGATCTG AGGGAGATGG CAGAACAGGG TTGTACTGGC TTAGAAGGTT CAACCATGCT 
TCCTGCTTCA GAGGTGGGAT GTTGGTTATG GCTCAAACAA GGCCTCTCTG CCTGAGTTTG 
CAGAGCCCCA GCTGCCCCAA TGGTTCCTAG CTTCAAATGC AGAGGGTTAA ACTGGCTGCC 
AGTGTTTCCT GCATCCACAC AAAGAATGAG GTTAGCCAGG CAGGACCTAT GGCCATGTCG 
CATCTGGTCA GGTGGGGAAC CAATTCTTCA TGTCTGTGTC CCTGGAAACA CTGGGCTCTC 
TTCTGTTCTG TTTTAGTTTT TCTTCTTCAG TAGCTTGGGC TGCAGCTTCT ACTCTGCCCA 
TTCGATGTGG GGGAAGGCCA TTTCTTTTTG TA.2^TTTGTTC TGTGTGTTTG CAGATCTGGG 
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GCTTTTTGTG TGACTCCCCT GTGGTGCACA TTTTACTTTA GAGCCCTAGT CTGCCTGCAG 
TCGGTGTCTC TTATACGTTT AAATGTGTAA ATAGTTGTGA CAAGACAAAG AAATTATTTA 
TTTCCATCTG AAGCTCTTTC CAAAGGCTCC TCACAGAGAA CAATGAGGCC GACTTCCTTC 
AGTCTGTTTG TTTTCTTATT TAAGACTATT TATTAACAGT TGGACCGATG TACCCATAGC 
TGTCGAATAA AGTGGTCCTT AGTGAAAATT CTGTATAAAT AGAGTAAGAA GGGGTTTGAC 
TTTGCAATAA AAGGAGACAT TTGGTTCTGG TTGTCCGACC CATGTGTGTA TTTGTGTCTT 
TCCCCCTGAA CTCCTGGACA CTGGAGTCTC ATCGGCTGAG AACCCTCGAC CTTGATCTCG 
ACTGTTAACG GGATGTTTAT CATCCAGGCC CAGGGC^lAGT CGGGCGCTCC TCAATATTTG 
GTGCAGCTGT GTGGGGCTCC CTGGGCGGGA GAGACGGAAC CAAACAACAA ATGTGAGTTT 
GGTAAGGCTG GATGGCAAAG AGTGCCTTTG ATTGAACTAC AGCCCAGCTG TCAGCAGCTG 
CTTCAAAGAG GCAGGGGGTA AATTAGCTGT GTTTACTGCT AACATAGTCG AAAGATTTAG 
TCATCCCAAT AAAATAGAGG CACAAGAGAG AAGAGGGGGG GGTGTATACC CCAAACTTGA 
AAGCCATGCT GGCCTCACAG CTGGCGTCAT TCAGTGCCCG TCACACCCGG GCAGTTGGGG 
GCTGCCCTCG CAGGCCAAGC TGTGGAGGTG GGCAGCCCAC CGCAGGCTGG AGAAGGGAGT 
GCCCCCCACC TCCCCGGCAA GCTCAGGGCA GTGCTCATCT GGCTACATCG GTCTTTGAAG 
TGCGCACGAA GGTCACCTGA CGGATGTTTC TAGAATCCCA GGCGATGCTT GGG AC AGGCT 
GCTCTCTCTT CCCCTGTTGA CTCAGACCCA GC^iACCCAGC CGTCCTA-^CA CATTCCAGCC 
CCTGCGATTT CTAAACCTTT CCTGTCACTG TCCCGACAAC TCAGCTTTTG TTCTGTTTTC 
CAGGCTGAAG CCCAGAGCCA CAAGCCGGAG GGTCCAGATG TGGCCTCTCA GATGTGTGCC 
TTAGCCTCTC AACCCCACCC CCACCCCCAA. CCCCAGTGAT GTTTACACAT CTTA-^kAAAAC 
ACTAATCTGT TGCCAATATG TTTTTGCAAA TAAGGAGTTT GGGCTTCTCT TGAGCGGGCC 
ACCTGGTTCC TCCCTGTGTG CTGCTCCTAA CTGAACAGAG GTGCCAGGGC CGTTGTCACA 
CATACACACA CCCCCGCCAT GGCCTCATCC ACAAACGGTC GAGGTCAGCT GACATCTTCA 
AAATGGCTGA CGGATGTCTA CTTGTGCCCA CGACCCAAAA GGAATAGGAA AATGGAA 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1..9 

(D) OTHER INFORMATION: /note= "WTl/EGR CONSENSUS SEQUENCE" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
GNGNGGGNG 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1 , .21 

CD) OTHER INFORJ-^.TION: /note= "WTl/EGR HUMAN TCC BINDING SITE 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
TCCTCCTCCT CCTCCTCCTC C 
(2) INFOR14ATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 1. .15 

(D) OTHER INFORKZ^TION: /note= "V;T1/EGR MOUSE TCC BINDING SITE" 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: misc_feacure 

(B) LOCATION: 1 . . 9 

(D) OTHER IKFORl-l.a.TION: /noce= " HUl^AIvJ FTZ BINDING SITE" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
TCAAGGTCA 
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What is claimed is 



4 . 



A vector comprising a DNA sequence defining a reporter gene 
in operative association with at least one OPl-specific non 
coding sequence lying contiguous to the OPl gene under 
naturally-occurring conditions and competent to affect 
expression of said reporter gene on said vector. 

The vector of claim 1 wherein said non-coding sequence is 
capable of being acted on by a nucleic acid binding 
molecule, thereby to affect expression of said reporter 
gene . 

The vector of claim 1 u-herein said non-coding sequence is 
selected from the group of DNA sequences defined by bases 
3170 to 3317 (Seq. ID No. 1); 3020 to 3317 (Seq. ID No. 1); 
2790 to 3317 (Seq. ID No. 1); 2548 to 3317 (Seq. ID No. IJ; 
2150 to 2296 (Seq. ID No. 2); 2000 to 2296 (Seq. ID No. 2); 
1788 to 2296 (Seq. ID No. 2); 1549 to 2296 (Seq. ID No. 2J, 
including allelic, species and other sequence variants 
thereof . 

The vector of claim 1 wherein said non-coding sequence is 
selected from the group of DNA sequences defined by bases 
2300 CO 3317 (Seq. ID No. 1); 1300 to 3317 (Seq. ID No. 1); 
1 to 3317 (Seq, ID No. 1); 2548 to 2790 (Seq. ID No, 1); 
1549 to 2790 (Seq. ID No. 1), 1 to 2790 (Seq. ID No. 1); 800 
to 2296 (Seq. ID No. 2); 1 to 2296 (Seq. ID No. 2); 1549 to 
1783 (Seq. ID No. 2); 800 to 1783 (Seq. ID No. 2); 1 to 1788 
(Seq. ID No. 2), including allelic, species and other 
sequence variants thereof. 

The vector of claim 1 wherein said non-coding sequence is 
defined by part or all of Seq. ID No. 3 including allelic, 
species and other sequence variant thereof. 

The vector of claim 1 wherein said non-coding sequence 
comprises part or all of an OPl intron sequence. 
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7. The vector of claim 6 wherein said sequence defining part or 
all of an OPl intron is selected from the group of sequences 
consisting of bases 3736 to 10700; bases 10897 to 11063; 
bases 11217 to 11424; bases 11623 to 13358; bases 13440 to 
10548; bases 15166 to 17250; all of Seq. ID No. 1, including 
allelic, species and other sequence variants thereof. 

8.. The vector of claim 1 wherein said vector comprises at least 
a second said non-coding sequence. 

9. The vector of claim 8 v/herein said second non-coding 
sequence is independently selected from the group of 
sequences defined in claims 2, 3, A, 5 or 7 . 

10. The vector of claim 1 wherein said non-coding sequence 
defines at least one V-t-l/Egr consensus binding element. 

11. The vector of claim 1 wherein said non-coding sequence 
defines between one and six Wc-l/Egr binding elements. 

12 . The vector of claim 1 v;herein said non-coding sequence 
defines at least part of an FTZ binding element. 

13. The vector of claim 1 wherein said non-coding sequence 
defines a steroid binding element. 

14. A cell transfected with a vector of any of claims 1, 3, 4, 
5, 6, 10, 11 or 12- 

15. The transfected cell of claim 14 wherein at least part of 
the DNA of said vector is operatively integrated into the 
cellular genome. 

16. The transfected cell of claim 15 wherein said cell's genome 
has an OP-1 gene locus and at least part of said transfected 
DNA is operatively integrated into said genome at said OP-1 
locus - 

17. The transfected cell of claim 14 v;herein said cell e.xpresses 
OPl under naturally-occurring conditions. 

18. The transfected cell of claim 17 v;herein said cell is an 
epithelial cell. 
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19. The transfecced cell of claim 17 wherein said cell is of 
kidney, renal, urogenital, liver, bone, cardiac, lung, or 
nerve cell origin. 

20. A cell comprising a transfecced vector, said vector defining 
a reporter gene in operative association with at least two 
DN.^. sequences, 

the first said sequence comprising part or all of a sequence 
selected from the group consisting of bases 2548 to 27?0 
(Seq. ID No. 1); bases 2548 to 3317 (Seq. ID No. 1); bases 
1549 to 1788 (Seq. ID No. 2); bases 1549 to 2296 {Seq. ID 
No. 2) , including allelic, species and other sequence 
variants thereof, and 

the second said sequence defining a sequence capable of 
being acted on by a DNA binding molecule and competent to 
affect expression of said reporter gene. 

21. The cell of claim 20 v;hGrGin said second DNA sequence 

comprises at least one v:t-l/Egr-l consensus element {Seq. ID 
No . 4 ) . 

22.. The cell of claim 21 v/herein said second DKA sequence 

comprises between one and six v;t-l/Egr-l consensus elements 
(Seq . ID No. 4 ) . 

The cell of claim 21 wherein said second DN.^. sequence 
comprises at least six v;t-l/Egr-l consensus elements. (Seq. 
ID No. 4) . 

The cell of claim 20 wherein said second DNA sequence is 
selected from the group of sequences consisting of a TCC 
element, an FTZ binding element and a steroid binding 
element . 

25, The cell of claim 20 further comprising a third DNA sequence 
in operative association with said reporter gene and 
competent to affect expression of said gene, said third DN.^. 
sequence being independently selected from the group of 
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sequences consisting of a TCC element, an FT2 binding 
element and a steroid binding element, 

26. The cell of claim 20 further comprising a third DNA sequence 
in operative association with said reporter gene and 
competent to affect expression of said gene, said third DNA 
sequence being independently selected from the group of 
sequences consisting of bases 3736 to 10700 (Seq. ID No. 1); 
bases 10897 to 11063 (Seq. ID No. 1); bases 11217 to 11424 
(Seq. ID No. 1); bases 11623 to 13353 (Seq. ID No. 1); bases 
13440 to 10548 (Seq. ID No. 1); bases 15166 to 17250 (Seq. 
ID No. 1), including allelic, species and other sequence 
variants thereof . 

27. A method for screening a candidate compound for the ability 
to modulate expression of OP-1, said method comprising the 
steps of : 

(a) incubating a said candidate compound with a cell 

trans fected with a vector comprising a DN.-. sequence 
defining a reporter gene in operative association v/ith 
at least one OPl-specific non-coding sequence lying 
contiguous to the OPl gene under naturally-occurring 
conditions and competent to affect expression of said 
reporter gene on 'said vector; 

(bj measuring the level o: reporter gene expressed in said 
cei 1 ; and 

(c) comparing said level v/ith that of said reporter gene 
expressed in said cell in the absence of said 
candidate compound, wherein an increase in reporter 
gene expression level is indicative of said 
candidate's ability to increase OP-1 expression in 
vi vo , and a decrease in reporter gene expression level 
is indicative of the candidate's ability to inhibit 
OP-1 expression ij}_}nA£0. 

28. The method of claim 27 wherein said non-coding sequence is 
capable of being acted on by a nucleic acid binding 
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molecule, thereby to affect expression of said reporter 
gene . 

29. The method of claim 27 wherein said non-coding sequence is 
selected from the group of DNA sequences defined by bases 
3170 to 3317 (Seq. ID No. 1); 3020 to 3317 (Seq. ID No. 1); 
2790 .to 3317 (Seq. ID No. 1); 2548 to 3317 (Seq. ID No. 1); 
2150 to 2296 (Seq. ID No. 2); 2000 to 2296 (Seq. ID No. 2); 
1788 to 2296 (Seq. ID No. 2); 1549 to 2296 (Seq. ID No. 2), 
including allelic, species and other sequence variants 
thereof . 

30. The method of claim 27 wherein said non-coding sequence is 
selected from the group of DNA sequences defined by bases 
2300 CO 3317 iSeq. ID Mo. 1); 1300 to 3317 (Seq. ID No. 1); 
1 CO 3317 {Seq. ID llo . 1); 2548 co 2790 (Seq. ID No. 1); 
1549 to 2790 (Seq. ID No. 1}, 1 to 2790 (Seq. ID No. 1); 800 
to 2296 (Seq. ID No. 2); 1 to 2296 (Seq. ID No. 2); 1549 to 
1788 (Seq. ID No. 2); 800 to 1788 (Seq. ID No. -2); 1 to 1788 
(Seq, ID No. 2). including allelic, species and other 
sequence variants thereof. 

31. * The method of claim 27 v;herein said non-coding sequence is 

selected from the group of sequences defined by claims 5, 6 
or 7 . . ' 

32. A method for screening a candidate compound for the ability 
to modulate expression of OP-1, said method comprising the 
steps of : 

(a) incubating a said candidate compound with a cell 
according to claim 20, 21, 24 or 25; 

(b) measuring the level of reporter gene expressed in said 
cell; and 

(c) comparing said level with that of said reporter gene 
expressed in said cell in the absence of said 
candidate compound, wherein an increase in reporter 
gene expression level is indicative of said 
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candidate's ability to increase OP- lexpression in 
vivo , and a decrease in reporter gene expression level 
is indicative of the candidate's ability to inhibit 
OP-1 expression in vivo . 

33. A compound that is identified by the method of claim 27 or 
32. 

34. A substantially pure nucleic acid comprising a DNA sequence 
defined by bases 1 to 1871 of Seq. ID No. 2, including 
allelic, species and other sequence variants thereof. 

35. A substantially pure nucleic acid comprising a DNA sequence 
defined by bases 1 to 2397 of Seq. ID No . 3, including 
allelic, species and other sequence variants thereof. 

36. The vector of claim 1, 3, 4, 5 or 7 further comprising part 
or all of a nucleotide sequence encoding an OPl pro protein 
in operative association wich said reporter gene. 

37. The method of claim 27 v/herein said vector further comprise? 
part or all of a nucleotide sequence encoding an OPl pro 
protein in operative association v;ith said reporter gene. 

38. A. method for producing a candidate compound having the 
ability to modulate OP-1 expression in a cell, the method 
comprising the steps of : 

(a) obtaining, by the method of claim 27, a candidate 
compound, and 

(b) producing either said candidate compound, or a 
derivative thereof having substantially the same OP-1 
expression modulating ability as said candidate. 

39. The method of claim 38 wherein said candidate compound, or 
derivative thereof, produced in step (b) is by recombinant 
DNA techniques, or by nonbiological peptide synthesis. 

40. A kit for identifying a candidate molecule capable of 
modulating OP-1 expression in a cell, the kit comprising: 
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(a) a receptacle adapted to receive a sample, said sample 
containing a vector encoding a DNA sequence comprising 
a reporter gene in operative association with at least 
one OP-l-specif ic non-coding sequence lying contiguous 
to the OP-1 gene under natural ly-occuring conditions 
and competent to affect expression of said reporter 
gene, wherein said vector is carried in a cell, 

(b) means for detecting expression of said reporter gene 
following exposure of a said candidate compound to 
said sample. 

The kit of claim 40 wherein said reporter gene comprises an 
OP-1 DNA sequence. 
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